Trying to extract the ratio of moments of silence vs sound in a recording

Hi there,

Before to start to implement something, I’m trying to think about which and how to analyze a sound recording (buffer or, preferentially on the fly with kind of accumulation method as described here) for having a quantification numerical value about : moments when the sound loudness is greater than a value (sound moments) / moments when the sound loudness is smaller than a value (silent moments).

Basically, the idea is, if loudness were normalized (btw, can we do that ?)

  • if loudness is smaller than 0.1 during 45s in total,
  • if loudness is greater than 0.8 during 30s in total

then, that value would be 0.1/0.8 which would mean 1 / 8 which would mean that “this sound recording” is MUCH MORE NOISY/LOUD than this other one, whatever its length, which got a value of 5/8.

Doing that on-the-fly could be like the thing I described here):

  • sampling the output of fluid.loudness~
  • considering 0.1 and 0.8 as static values (probably annoying)
  • if the output is < 0.1, bang something that increase a counter
  • if the output is >0.8 , bang something that increase another counter
  • in the end, when recording stops, do the ratio.

Any ideas ?

I have moved your thread into this new category - this feels like a good question with a good straightforward answer.

I can think of a few ways of doing this… but maybe I misunderstand the question.


----------begin_max5_patcher----------
838.3oc0XtsaaCCC.8YmuBAisWFRyrjuuGF5+wPQgunjnBaICa41rUz9sOcw
FsY0QQIwcc6gXGwHKxCIkHcdbgiaNaGtyE7MvO.NNOtvwQIRJvYXriac1thp
rN0zbKX00XJ2co9233cbk7NRElVfA7ss3tsrpxwYzjwK1RnatsEWv0ZBl5sx
ao3Vj5VnZDRbEbyvCQJUKJK+tqPwiqDsulPqvbkg.eQHqmOJ0SJ7oEKjWVdg
DI3.yMPAJMQY2991PQzGDEeFvVCpX8kf0sY03NPGQFlDOGGrGgUDJtf0SUOF
x.1dwRPCPIuBaoOXRrCemwdcESrHFhRwPs0FJukpFfPGvX8MZrMYRGHG2dKl
lkWgUl7vusl0VmozXzjnM5O0h3+rAqsNW2k.27L5FWvMmA8T7CBC+Mwb7tlV
vmVi.eUbEB9B.54s5n9H+TjJhB8M5jfS5jPShM7fXeV3l2y4LposjQwuZKIJ
X75Tf.SO6nskPdAw095bb6wiX9pH1vIOGHfAimINeuxgOFqvfD6gM7ebXmdC
awVwJhAF2jBCBzG3pB8QFOIC5eBG65aBUhrjh91rBrrPiEY3nDanEMIsgSRa
vgncfzwOyMwbP9woMLdjSCzBOgX66wYSSyWGtB.Od5KR21fefQD8NgZLHiHt
7bq0b1ItifFGc7L2z+CRb+N3p.OKNWR2LuejIdCmidGla.WW0SJWIaQlh65d
FbsHUVPGXTB35GHzR1CcjegAIIHOv0aYMpQAAPabMHcOxIl697D1Var5zr5b
xJKd93Dpyx8CMtodN3qirglUIob3a6wpRspWj4OdkVkkKkuuCni02VLtzisO
.dw5KwcbBMiSDMc9xjjc.CfGvIaslhsUSdWnlFWDyZxeFzjMNO49kWMIVao3
vzC9teV6M8rP0oygyzFDmCe43hXTSg+sBZvoCZvKCwTaTszii9PTc5zZ9xxT
sh48iI0jxFlnn2vYZvP+UgxWGR8OMcYtAqRmimizYnsw5KVS1bZ2azitxQVS
y831tgIqTgnH4cLUbOYoZHgpGpZRysEeOYb9pFzbyZE0H4hBj8sJyxcWTfq9
QYhLHZOYH8U.mPkpBvT4eLWSllCUc5EOs32.kRC96A
-----------end_max5_patcher-----------
1 Like

pasted what I have did beside of yours.
yes, the only one threshold is interesting.

----------begin_max5_patcher----------
1616.3oc0assiaaCD8YueEDBsuT3rk2kTQQa9OZBBjknWq.aICI5r61fju8x
axq8FegThIM9g01jqrl4LyvyvYn7muaVxh1mD8If+.7OfYy97cylYlROwL23
YIaJdpbcQu4xRJa2rQzHSla+eRwSRy7XfbUmneU65p+9cMuqYUQWEXgnrXWu
.TKAUhshlpdPai5BEf91cMUC2j00MhR0Dl6D0M4V0cSInBYcayGN5JxbWQyt
Ms6jqERilAG9dExxU0MO7gNQozhLFieOaNfPf2CmCPDj9MtZ.38ubqpaFtSH
2j0UFj0t3iugvRzy8k6tS+x7IZq5DU6JUpnwRrbc6i.kIqoUBTfTz2Wz8LXa
W6hhEqe9jlHxUMQrfMQXh1Dgx4FSDhoeilFhIhLFSj5ttPzkbR0E6l0Nk74s
BqtljLGjrnn4gj8J2oPTJzhgT8aLy.LND.gS1e26J1HjhtOHZTNEiZ.+oBrH
HhOMzhyiNZEOptweSvuDrDrLPafZMRgTaHre3RVBNyXAvVuOBNBKAMIlnEdZ
rhNGVqajWwUioFLQvFONNX.hxhH.W.vA5LMAy9DTmyQZNoAfRBFnbZTYs07r
IgRrlgLYaX1ElH2JyPx8vwwDE80qEMkhfwAmxzNCGACOMbXfhILV2tqJXLji
NLkP5HbEvwfAEok5F+8f1OEkePvUd3bcrrKx569eKa61TXDIOdDG0QlWLijG
PFP7IrE7jedQmySOdzQSiH5DOssC7KKQfeW8JNPn5UDMwRWZ8ogAzXl9tWrF
fFY9sKGsZYeHtzBvQ3PwQDm+E3M3XuQkAHRFMDIvaAWYJaptRbLck+I3M.Zr
8kCXbB9xHBQSYtmqPJ5kv3bahU8edF9Rw9AY1IVgdqAYmWdBPNloPenPJhYp
ENyzTCWKN37fCgSiZcg0f5.Yibdyq4E4Lh08kO1JfyhLsK7bDuDupQ7ZDvt5
DQVW5UAL4DDviJQir8gGVKhLSqy6wsU6RGd0+h64wtONWoRxW56av0igxIV9
F7XqMFmF29x1KjAiBBzTgOkkOZTviIJ9UP6Rft9XvRc.POnuV6lTeOI3HDdT
GiwgBarsXZNid.rIjPfM6mrpooV.Ybm4nvabH4+shocEjgMEjg.+lgELp0kY
sMnL6wCPBeOtip8OK1IksMABjq5ncMG1shcLjr42NGMf0ywXYi7fAPo2PGCh
cs6D.K6GzofTtRgEA3bKRIWxL349.QTGyr83NCmOCcyUylKVGl5ChOQ4KHbT
2h+hXSbYwGMaO8andTzsPCUPzTK8rAsDZv4ZPvas.WGjYP7Hibyiay.idCjF
7o1fVR3EeyhH.WtdWc085sH2H56+J3spPYklBFlA71GqapZeru9eEfrLLD71
UsaMinTDbDYt7hnFa2Bc1H14YDMNEUkeMPD1W+PSwZMNcexCzZiwIrv4vNDq
lukoPlW8PLYDrd9iM.8s65JGTa2g9LG7hvqD8x5FyCTyAWDkptHD38mzH6qj
no9JI3TkD0CIoO4uoKIODj9AK4fKZSc01VEaQ+PWVyMGqL21B.Hlse39uRaW
kh+8r6+xWk0GOs9Y947JaJ6Pkkcfp+s5JZR5pOwJ3XDqf8wERtjQgysObc1x
nblHhsbpoqZnqoZvK5vTYcdQ2bwZwP2PYdX1zAb3yoZHHw7D2QQVcy10RynX
GL4kxRCPYywWRYm1pT8wPdUuNJFjWXnO9P5khuxSMYwPYlFIb3noFeQ7P2RO
xR8ZWFy9.cZqUHEg1OJ1drLObX3qrL0pq1VeOXFYjoaF8gyOFwRYd5tlbLK1
SdmoJHhOKNn3XrGBrmrSSUPdsSuKtlZH+uswRGNJ1qo7ZGb7KoqCarxpqGNJ
1oW3TeMrWiGcXOD3rXkml6ydHdUM.uV2xsOK71M8kQyNXSeSaIlO6wKFqvP9
HnXHGeDDJJRxGNCTLHby8QPrKE+fr+3Sz+FTlLr8QaxOZibQhOxKIa7J3XSu
v7QxGG4Eq8M6yJT12CI6U3MIFKjfdFQMUA4kWDdZa4DKnvK5unXL8ppcTLjD
yWIcbWqrsKqX61OI55cWsQHIaJ9Xa29ebcJNrF6PSqlS5Depd35Ms7NonqbU
sTTJ20YTrjmzoZMe0VkaqYWsaUfBdJQZ55Xi9oQXagEIllSd2Wt6+.Hj7iSH
-----------end_max5_patcher-----------

This one could be enough, if we define THE silence level, static etc.

But actually, trying to have something that could measure correctly whatever the sound level situations.

I mean, if someone speaks globally low level, sometimes making silence and sometimes talking (low) OR if I have someone speaking LOUD sometimes making silence and sometimes talking (LOUD)… adaptative things could help.

Was thinking about measuring also the minimum and maximum, (but having to KEEP / accumulate all values (this is why I was speedlimiting) and then, with that big list, I could normalize it, then count all frames under 0.1 and all over 0.9, for instance. that would give me the low and high levels part. Could also be done with only 0.1. if smaller, then silence, else, not silence and I’d count my frame number smaller & greater.

this is called ‘de-trending’ - the idea is the same than we have in FluidAmpFeature FluidAmpSlice explained here :

https://learn.flucoma.org/reference/ampslice/

In this case you would have a long list for the trend (100 points in my patch would be 10 sec) and a short list (10 points would be 1 sec) and then compare the 2. when the short term average is louder than the long term, it means short term is louder than trend.

I hope this helps

btw your code is fine 2 with a typo and one explanation

typo : be careful in your < - 40 you have a space between - and 40

explanation: you ask why I put [change 0.] and suggest ‘reducing the flow’ - it is a little more complicated an answer. the FluidObjects output a value every signal-vector-size even if the value has not changed. So in my case, where I put @hopsize 4410 you will have many repeats (if your svs is 64 like the default, it will change every 68 or 69 times (4410/64) ) so in effect you can ignore all the similar values that way.

I hope this helps even more :slight_smile: