Spectral "Compensation"?

After getting distracted by a couple of issues I’ve gotten somewhere pretty good with this.

Some final details to sort out, after which I’ll post some code and stuff, but I wanted to show some of the results from the points @tremblap made last week.

So what I’m doing now is asking for 40 melbands between 200-10k and then processing/filtering that down to 8 bands, which will then feed a chain of cross~es.

The first thing I did was apply 5-point smoothing to the envelope. After struggling with the results of zero-padding (particularly since at @numframes 256 my low end resolution was dogshit and clumping a ton of energy into the first bin) I decided to mirror the edges to keep some the center of mass around where it should be for the first/last frames. (wish this was an option for fluid.(buf)spectralshape~!)

Once I had that I needed to get that down to 8 bands. So I compared “sampling” every 5th bin vs taking an average of every 8 frames and spitting that out.

Also for comparison is just averaging 5 bins directly from the raw output (rightmost display).

Here are some of the results:

There are cases where the “every 5th” holds up well, but I think the “average 5” seems to best represent the overall contour (short of doing PLA).

The straight “average 5” from raw isn’t terrible, but in the 2nd example it turns the peaks and troughs into a plateau, which the other versions don’t do so much.

I also compared the smoothing in linear and log domains. The results aren’t massively different, but they respond very differently to empty bins (which is what prompted this error/thread)

Now I’m at the point where I’m trying to optimize things (which prompted this thread) and wanted to compare to see what kind of results I get from this smoothing/downsampling vs asking for 8 melbands directly.

My initial problem with this was that if I specified a range of 200-10k, the initial chosen melbands started 484.957172Hz and was already at 1300 by the 3rd band. This is compared to starting at 371.274232Hz for the same range when doing the smoothing/downsampling.

So I decided to “fudge the numbers” by massaging the frequency range I’m asking fluid.bufmelbands~ for, so it more closely lines up with the frequencies I’m after.

Using the process above, I end up with these bands:

371.274232 753.952319 1269.088749 1964.959113 2905.690638 4177.751245 5897.994634 8224.419765

Now if I ask for 8 bands directly, but change my min/max frequencies to 100-12k I now get this:

387.683717 778.82 1310.610312 2033.635117 3016.662999 4353.192273 6170.343679 8640.95117

Which is pretty damn close. (this also saves around 0.1ms!)

And if I compare the output of both approaches, I get this:

These actually look really good, and I think capture more of the “extreme changes” that can be present in the original (obviously, since it’s not smoothing everything).

There are some anomalies, like the 3rd example here. The native 8 melbands version has a huge dip in the middle, which I guess corresponds with that particular band being centered on an area of low energy or something, but the “cooked” version works much better for that one (visually!).

So all of that is to say, I may just go back to request 8 melbands, and “fudging” the range I’m asking for to get bands centered more along where I would like.

Once I tidy things up a bit more, and finish testing/working some stuff out I’ll post code/comparisons.

1 Like