So in playing with fluid.bufstats~
, there are a lot of powerful things it can do, and @tremblap suggested some useful things for calculating the “shortness” of a sound after the plenary, but I’m still not sure how to best leverage it for other purposes.
So I thought having a forum post that talked about some approaches and use cases would be useful.
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
My first use case is figuring out how long a sample sounds, to keep as a statistic for querying a database.
@a.harker suggested getting a time centroid, by sending irstats~
a center
message. This, very usefully, gives the moment in time of the sample where half of the energy is on each side of the sample (I think). This is better than what I was initially thinking of using an RT60
measurement, which I was told was problematic.
Then using fluid.bufstats~
, and taking the “mean of the derivative” (more confusing sounding than it is, since it’s just the first value returned by @numderivs 1
) to see the change over time. @tremblap also suggested taking the standard deviation of the derivative too, though I’m not sure how to make ‘real world’ sense of that one.
So between those three stats, I’ll probably come up with some kind of weighting to get a single number of “long-ness” per sample.
So at this point I have a question about weightings and aggregate statistics. If I want to weigh together three numbers, which are in different units (the “time centroid” is a number in ms (or samples), and I have no idea what units the mean (normalized amplitude I guess for audio?) and standard deviation of the derivative are in), what would be a good way of doing so?
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Now the next use case would be to try to extract meaningful pitch information from a sample. In my specific use case the samples themselves are monophonic, or rather, should only contain one pitch per sample with nothing changing over time (with regards to pitch).
They are, however, metallic sounds, so odd harmonic structures sometimes (example attached below).
Since I’m in no rush for these analyses, I’m using a tiny hop size and a medium sized window size (@fftsettings 1024 32 @algorithm 2
). I guess I could probably go bigger for the window size, but there aren’t really low pitched sounds here. Would there be any downsize to using something like @fftsettings 8192 32
for pitched metallic sounds where I’m only interested in pitch?
Now the data I get back looks something like this.
First the pitch value:
And the confidence is this:
Now some samples aren’t quite as consistent as this, but my thinking and questions are more about how to computationally extract the “correct” pitch from this data.
So given that my samples shouldn’t change over time, I probably don’t need the derivatives (right?)
What statistics are meaningful to extract here? It looks like the median of pitch would work in this particular example, but should I weigh that against a confidence metric? So perhaps something like taking all of the points at which the confidence is in the 80th percentile, and then taking a median (or mean?) of the actual pitch from that reduced dataset??
Is something like that possible with fluid.bufstats~
?
Here is a more problematic example from the same sample set.
Pitch:
Pitch zoomed all the way out:
Confidence:
(negative confidence?!)
So for a sample like this there is a high confidence bump in the middle, but oddly it does not correspond with a plateau in the pitch information.
The pitch data also jumps around all over the place, and other than that flat bit in the middle, I would be worried that a vanilla median would do the trick here. Percentiles would also be weird too I think.
This poses a trickier example I think.
I’ll also attach this sample as a point of reference.
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
So again, not exactly sure what the best way to go about this stuff is with stats, but just presenting a couple specific use cases which it would be good to understand better, and hopefully have others post similar “problems” here, which we could collectively find “solutions” to.
easy.zip (229.9 KB)
problematic.zip (66.8 KB)