Spectral "Compensation"?

rodrigo.constanzo · June 29, 2020, 7:13pm

Ok, did some more testing in context.

Since I’m querying based on on descriptors, for the most part, the source and target are kind of similar, so fairly mild filters are produced.

Things like these:
Screenshot 2020-06-29 at 7.54.48 pm

Screenshot 2020-06-29 at 7.54.41 pm
(the normalization here is for display purposes only)

Since the corpus is finite, however, it means that sometimes I get stuff like this:
Screenshot 2020-06-29 at 7.54.36 pm

Where I have a linear amplitude (per band) of like 9.0, which is pretty chunky. I’ve cracked up the regularization, but I’m not sure what the limit to “a little bit” is, so the highest I’ve gone is 0.50, which is waaaay higher than the 0.08 I was initially using. Don’t know if this starts to fuck the numbers too much at some point or if there is such a thing as overregularlization.

So my options are regularize more and/or clip the compensation filter, but I’m thinking that I may want to try to compensate spectral shape, but decouple it from the amplitude compensation.

The last version of the maths that @a.harker posted estimates the loudness of the playback with and without the applied filter, weighted by the k-weighting, which I guess includes an overall loudness compensation already?

If I want to remove the loudness compensation part of that filter, would I then normalize the resultant filter in a loudness-weighted way (doing something like maxgain + an additional pass of k-weighting… weighting) to get a filter shape that would be the same loudness as if the filter shape wasn’t applied? Then do loudness compensation in a completely separate way.

I tried a couple of permutations of this, but I kept running into a weird/crazy bug that I’ve seen with vexpr a lot over the years, where I get stack overflows without any feedback. defer seems to fix the problem, which leads me to believe that under certain circumstances vexpr doesn’t like getting numbers from a high-priority thread, but I’ve never been able to isolate this problem (even though I can 100% reproduce it in a larger patch). All that is to say, it’s been tricky figuring out the normalization step as half of the things I try result in a stack overflow.

a.harker · June 29, 2020, 7:40pm

The last version of the maths that @a.harker posted estimates the loudness of the playback with and without the applied filter, weighted by the k-weighting, which I guess includes an overall loudness compensation already?

The whole point of this calculation is to produce a multiple that removes the loudness changes induced by the spectral filter - or in other words to make filter that compensates the spectrum but leaves the loudness unmodified - for which we must know the loudness before and after, but it is only an estimate, and won’t be as accurate as the method used otherwise (by the way the k-weighting is part of the loudness estimate, not a separate thing). That is the decoupling you are talking about here - you don’t need to do it again. There is some fairly fundamental understanding of what each stage of the process does exactly (in conceptual, rather than technical terms) that I’d suggest you try to get straight in your head. At the moment you seem to have misunderstood the goal of the maths I posted.

rodrigo.constanzo · June 29, 2020, 8:22pm

My bad. When we spoke the approach morphed a couple of times, so I wasn’t clear on what the final version is doing here.

In that case, I’ll mess with the regularization and try clamping to see if I can keep it sane in a wider set of cases now that I have a patch that does it with loads of different samples.

rodrigo.constanzo · June 29, 2020, 11:14pm

Aaaand got something working well. Regularized with a medium 0.2 and clamped the filter with clip 0. 4. so nothing jumps out like crazy.

It sounds good! Surprisingly the different time scales don’t sound too crazy when applied either. I think doing the like-for-like compensation sounds the best, but since the final filter goes through some smoothing and averaging, those crazy spikes tend to get flattened out some anyways.

Reworked the variable spectral compensation amount too so when you go negative (i.e. inversion) you scale things up some, since things weren’t getting boosted as much as they were being cut, in the inversions.

I’ll try to work out a variable amount of compensation in dB as I’m doing it in linear amplitude at the moment out of maths knowledge/efficiency (the way I was doing it before no longer works with the kind of output I’m getting).

rodrigo.constanzo · June 30, 2020, 8:57am

Ok, got a version of this working nicely in dB land.


----------begin_max5_patcher----------
3671.3oc0cs0qaabD9Y6eErB8sdNB68K8olzzlBTm1fZWDTXDb.kDOxLlhTf
j53iSPxu8tWHoHk3kU5rRVQvPldWdYlu8alc1YGJ+Ku9UyVj8bTwrf+bv6Cd
0q9kW+pWYZR2vqp92uZ1lvmWlDVXNsYoQeJawOM6NaWkQOWZZtLXQvi0stMr
b4GhSW+PdzxR6MGy3yo2EfXh4TIjRnTBgvkPttQooKzbPvOVcGR2sINMIpz7
Lg6aLaWYcqnpViWYD.kPcOoV.rmV4m2FYe5yVDltd1cAydLIKTIQ+n9z90W+
Z8W28xT7eNIHOZ8HZNhoTr6BvP9bn.CjPfPHoBLQpZTn6ZXMG4plCGTy0Z84
ovahJJBWGcjFeuXNGhYBZv8j4LDjAQA2CoyoH.ExCtmNmiQbBI.NGHf.nLfL
Gg.blHfOmIoPFQ0BgPAzlArs4QEQokgkwYoOjDmFsLaWp4wAYSBsPBaNBBXD
EFqQVFkcWfhgcNXK7XrkMB35UfkoPGgfavUJDQwhf6QyILIFoPX1bLVx3vf6
kyYb82RMXpwdEZxHRJybH.PAbnC.KcRfUx0eqrS8DRh49FI2kTFWjDuJJeTK
P3bo5CUp.Flhsfz7CkJJobNR8GDVQa3XkNR.ZUkyZopwqSyT2qj3ke7.mQt5
gpP8uK+bRz.nBBUK6Ew+r4jDM5Rd3lnxn7GhRCWXuAfILz2+LUxmBrr8ZTKh
Vq7mmOEjrUciU21o89QTDSLiAEHHW40moH52EPwiyqvtxq7MsJI9on4qhCSp
uuOElmpFILc91sJkKeee8MYGgpmQCJIVc27s3EO8lxE2rQXFvwXF0S7USOBe
JZ0Cgkk4wK1UFs+nhJXoBWzpdxtnrGqatt81hPbZboFrLOMJXud12Ycr.24r
Ttn9nFpKF7DxRWOzXQmSbSE4GBzj+95ONU2+8CcBEeHKuzoGUMV22cYmRsar
9o19Mzw5+xOFiaCVlsYaTZgwS+Wsw3ke5fRXpIq4DlxzDiIXNEQTzVrul4DM
5LmskNk+6C3cOFmD8TTdgRaZSElEtcaql6RI2D9SY4sbfpaJN01zdtTdzSw0
WOso0vbEFUp.nc41A7mYj8i0axTSwjtK1bmpGDqEIyfigztMbo8h0ig0cuG0
oRS3J.iaAU.vl4ZnsbLnF7Wmjs7iQqZSmlkoFXiSaOUdmtWE8XnZhvGdLKsr
dJDHZdu8+XkD1amMj8uJeuyuWMacd7prTsPzYjP2b8i68pvIL5FssxXNizvs
8bwJ5gBVFnyBkRtqXQXtdfpxYAptyxrrjtc0bcIQOVV0813zzCPwxrsC2Yd7
5OLx0tHS04lwt2ldJTl61dePwIJeP6ls64EljTYA281+bXZ7lvxnxX6P.Bzz
o0g4GJVlmkjzQes87TO8rRwwWF8o3Ukev7fZSFTmd71ZRzrlQ4UwqiJJ61VY
35htsz3JqUS6VTYC+PYzlsIJsn6InrNhKJUtS+TQ0IVSzZC.6WFZaa51tC6z
9XtE65Z7w1s1iuPnTNmKa8gpm6V4fjzpMgf02BUGz43vNHOzIIkzV9NxQY6U
r1Zti5oL7LVUF7Xvj3ERsPEv9Op0Bp.EXO3kpmoAL3f.FZ..C6.fc20F49C2
qW8EX9TfGUgSs.OLTnyM.YtJlX8GjZ0d.HRsVjiW0kGHaB7MDYqXYXRjdEDv
45ucB95k6wj8v8.NferSF+jiheWYnyfXMPWvjnmN17VePHtNDPPOfG4R.dxa
DvKbQwzXEhfOdVArjeLXYWe245ja.rhHtQMTuGp+qyB75ioYWhrmIZD9sjUp
E2z+AbdrNhhg0iIJbNmdfwruwQzsANtZQYVnC91TKpRZS0WGnBKu.FnXxsA1
nh6X5YMAUofh0COhAu.A0hA2PtuTmW7lcabGmTqL+HbhSdIgiMPnrXnagxFm
dMWBPRPhSFaDf37VizIGxOdBGQ2cEM3dJ54s4A+wGgA+I02nf+h1KeXtNCQA
cFNGNiJi6BizSLFP8VR3GqzAxRzgHN91v8ly1tXRUFMAWIaWD51z1cwzVuji
WFfMI3pUh9hBlEO.Tgmz7sdC58KTYj4oPCrMhAVcDWPSDCGGEgEf.igLfSN4
OSFoZqtWlsYSTZ2TzYteoqhdtUlouZPWuYN65gctsDoSA.QdD.sBzDHHQdL4
iwwmGBNrc4.N4ocVP9YQu1mrecwELPJbMJtt+9gshrc4KqG1pcsFzUQWEUTF
m1r2Gue+BTBHN5x3jkBkWq.jCRQs3d4jBnCRgNOxWVovkQDcNdtXRgdUENIE
3JP6xHEPWkBvkDKPNxKpE2KmT37HxESJbkchuj1H027osTuj1HDzovNuT1H5
Lv4lGbwkDKDmx7HWJof5pG7ZP6xIEtfEhKI6j5rMxkzeA0U1I5hJEtxNunyr
KNoY1uTwZUS5llcdr3lkuxV9PfurhlAfv8KZvKhnIcUzvC6eqpw5pYppx.sk
myHkE3IUFIqSxVDlbPsezWUlbbYwYTttqsvVkbGhUU0J2QXjonI269cHWu6w
mipGOWeRcbsMvyA5omyT5itx6pmb+rePPGTHjOzHF7JAcX9U5AgbQi3d3A4h
B0d8ocLqNnVJ0OiCpgxCpexiqcxgqaxCqYRiGFa8icfo79jjuaUb1aME+2Ce
WT5NqGh5RTrKRrX8iwIIKyRxNpzQqSBzLauMkoY84pKGdjj.0kdCXNFh45WH
.0QpCL0909jZXtFX8EQnDI.oOUBCwITyQBDlh0GAN3xP6eV.nz9D.RAfXOR0
D7N86Hy9KKLccUwNxaUwdayy1lk2T0mywxlyeWY157vUwUosAzw+4cUDobUu
GV1d12bh5qoMwqYv3M6VFN1.P6s2PQz92aiRCdaXZQvai1DuHKYU6WFgDEAQ
o.cFB3Hf40..LGx3HKDJ..HoCpXRts45dHNUyViZfUrDAMvJpYbAwEDbWX0I
lRCJN.YwwAvtzEQCIQvATgvdzQB3gzEFEXgESAgwsGYuSWD1RELEtbopiN5M
VCrFkkTKSZSDcasA3NWCgSgXiRijXgzdDUMKr7vQkNWlDPfbyyRcMDtQqEDj
jatATVy0onRaNTN4BENYAZ.THrGoDYDp5Y5EChuSEMzTFD0kW86aMCc6MbSu
GydSf99up7AykANA6T6FQterKIR4AeOXZ19G0vgZHfZIgJtq9UixISRy0V+0
umsAc9x7rQ3Abk1uBF8Xe5lmht3njpm5yZTQ3.ChpAT7T1mVntui7mI1O7t+
9IPk+GedUd15nz2YXPcdeRruoisoO5o4Myw.QPJy5USOxZl1P9kmn5lu1u3g
VzOOULNOkB3LinAXTLzvSgpgAzXyi.0gkYc9qImlgIdc.ZCySgLrhVadBbBS
JqNBbvipuYQHLHDXioShXDCpRkPkYuOo3lMHLd4IPy+1nznmBm8B791uIwKx
YcOgVu2mKVxsdLwxJeM5lnX+Pk6nDhlf.O9nyx.XLdtbbZNoxMZeGMb7RNdQ
GvymBi8IW8q2UVp.nQXrGG.gnYzDTcj2EquoohptgDpuMLN82bTpTzsFWaGe
j+EshOEq7CLlvcDoFV4aX+WWDYBcSIT+y1+hBzuL0+JZOCG.8oK8awOn65dm
KaJIwm30ahVcZ1gtpEriNYeJ1p0ykG+7xx7jWfz6UAR+Bi+aimzqSy8wsBC4
eksJpXpPd5ed1yY19a.EdZWHSXb6UoY2lEmt+rimz7BHZee7xxowpu3imeu9
cuu7lKZi2tLaaj6garOq0Rr9Suw9+kCjeW3B2UEGmpsOlyjdQ7pRoVs1hICf
tmUSL3LkWwHgdW1Z8pl7V3riLnf4p0sCsWAACMK+iTssJ9Tk9ualXikthv6O
D9TziY4a9M+FpIXNs+oOuNK2XUX9GuOU+6ow8lDOLlx0elIHMlk7lM1oYKdD
FszCIrCNA0y64pv6IqqatJZxKzIlD69b9LYVh8BS4mpJID.n5EmxG2JzorYH
KRhTdls6a4Wmk8wg2XjQyPKXxjssOU8Gcf+R7laR3uKHyGtf.p2nbwoebT2+
GMtuOSzLf15o8QsMi52QMSAI.Qyb4XyU5R5YY1GYy2dHCstKSWQWjddmINxK
nCC.8s0ZSeUm54eXFbcz6buIFYjmlWLS9ncMY+ftzd9RGrThNThoCg9vbi.j
jppOPJwBd0QbgCKPf0DrAsw8YsEf+zqMgOyZ9chqec5z2pmdnxhtqzgWYNNH
07fARf278pU3VuVgtrJODlhjcmGSa038wgN+j75qwBODZpnYi8O9nQb9xE5p
R6tdO5JNk9KoN05Zc31nX6W4RWS5VqZ5y6w2ZeSs+5jcQ0+.c67p+wPd09Xu
2Yjz6tfpDv7rOkdxR30wIoUB+qeN7zEP.RRs6hOmiwHqsCBpBmy+B32lGEcF
RnA8ZREGjoiIv+B22o7rkVFdxhmf.XLaRBY.kvYfRlDJo9WF+OQqNY4i2DJJ
rY8NfpYL7s78+hRRx9z3h3wEuEUB.hQmaFQs0CCW.El3qUWCkPuXJ.52iJfY
6K91rjyfhzj8BpJ9Gpsvf7dYqscW91jyK3AdS0zQfTlvlKZsafwKyILf.rAF
gn.lzXDfDTFxq0pTdzxn3mFem65iSzrdl85DnccI5GgytDEMsv4Y7YDkCMY0
DA7ppLVwVk3QQ6qCKpHJcUws1LH6AY+a60a4oB3P.2tEUDFPVUG3bnW4Nkga
OgrD9lvxrf2nWE5Lu87KJidbWRR4osAK5WZEtARYMwpqSEPSIJNPQQJnTnMM
5LLGaemIX.y+OX3SP0pTWGcBI3BavI3F5OFIwRwsx56K0uGj2Ruu.eRmMmOe
BD+uINu7yA+s0YS8p7.zCglUYhqF4Tl+UagfSu3.NVI68moTnTBrQmpnzjps
bmo+sUtasc2EXwryCXsu5bC7a9+q+0W++AHf2C7G
-----------end_max5_patcher-----------

rodrigo.constanzo · July 12, 2020, 1:17pm

Now that I’m making strides in the “using a KDTree to predict the future” idea from this thread I’m revisiting the idea of compensation, but in the context of MFCCs.

From what @tremblap (and @weefuzzy) have mentioned on the forum (and elsewhere) the 0th coefficient closely correlates to loudness. I even remember @tremblap posting a comparison example, but I couldn’t seem to find it now.

That led me to wonder if that 0th coefficient would work as a vector for loudness compensation (ala the approach mentioned much earlier in this thread).

At the moment, when I do this with vanilla loudness, I have an incoming target, I query for the nearest match, I subtract the difference between the two in loudness and then apply that as an amplitude offset after the fact. This works really well, and is the basis for the spectral compensation discussed in this thread.

So I’m wondering now if the 0th coefficient would play nice with a similar approach. Either by subtracting the incoming samples 0th coefficient from the corpus one, and manually adjusting the amplitude of the matched sample, or in the use case from the predictive querying thread, manually offsetting the 0th coefficient when querying the second part of the process.

Perhaps/obviously using a standalone loudness descriptor might be better, or perhaps leaving “loudness” out altogether, might be better for the predictive part of the analysis since I’m more interested in spectral morphology than I am loudness, as I can take more info for that from the real-time “real” input.

rodrigo.constanzo · January 3, 2021, 6:35pm

Returning to this today, and double-checking something.

In your p e(njw) subpatch, you have a hardcoded 48000:
Screenshot 2021-01-03 at 4.39.09 pm

In order to update this for other sample rates, that number would need to be different (so in most of the examples I’ve posted (where I’ve not changed that number)) the loudness compensation hasn’t been correct. And it can presumably just get its info from dspstate~?

weefuzzy · January 3, 2021, 7:16pm

It’s hard-coded to that because the coefficients in the standard are only given for 48k:

To get equivalent coefficients for other sampling rates would require a bit more reverse engineering. This seems to be the bit of Gerard’s code that initialises the filters for arbitrary sample rates

github.com

flucoma/flucoma-core/blob/97f68a56e9d773d67a556e78cc937252b4e77f89/include/algorithms/util/KWeightingFilter.hpp#L22


#include "../util/AlgorithmUtils.hpp"
#include "../../data/FluidIndex.hpp"
#include <cmath>
namespace fluid {
namespace algorithm {
class KWeightingFilter
{
public:
  void init(double sampleRate)
  {
    // from https://github.com/jiixyj/libebur128/blob/master/ebur128/ebur128.c
    // Shelving filter
    using namespace std;
    double f0 = 1681.974450955533;
    double G = 3.999843853973347;
    double Q = 0.7071752369554196;

Which, I guess, is based on what this library does

rodrigo.constanzo · February 27, 2021, 2:02pm

Super old bump here, but to clarify this subpatch here. I initially read this to mean that I can send arbitrary frequencies in and it will give me the correct gains at that band, regardless of the sample rate that I’m at (e.g. 44.1k). And that the expr (-$f1 * 6.283185 * ($f2/48000.)) inside the p e(njw) has a hardcoded 48k because that is what the coefficients are at in the standard.

Now I’m starting to think that that’s not the case, and that this only works if I’m at 48k in the first place? Or is this a generalizable bit of code that is agnostic of the sample rate that you’re operating at?

weefuzzy · February 27, 2021, 3:08pm

I refer the right honourable gentleman to my previous reply

rodrigo.constanzo · February 27, 2021, 3:38pm

The:

part is what threw/throws me.

Does that mean it “works” (at 44.1k), or it “works” (at 48k as the coefficients only work for that sr)?

weefuzzy · February 27, 2021, 6:45pm

For that patch, yes, but upthread I show how coefficients can be calculated for general SRs.

rodrigo.constanzo · February 27, 2021, 7:13pm

lol yeah, that was the literal previous reply… I was scrolling way up to re-read the posts from back when we first spoke about this. It’s quite a long/dense thread!

What’s even funnier is that I have essentially bookended the answer by the same question asked two months apart.

So let’s agree to meet here again 2 months from now!

weefuzzy · February 27, 2021, 8:08pm

It’s a date

rodrigo.constanzo · February 28, 2021, 12:29am

In the interim, and while I’m necro-bumping this thread, I’d like to run the maths by you (@a.harker) real quick.

At the moment I have the maths you posted above implemented like this where each inlet is an incoming list consisting of the gain of 40 melbands:

If I forgo the k-weighting compensation (for now (and/or as a selectable option)) I think that means I should be doing this instead?:

This seems like it’s putting out the same/similar results (as in, I’ve not fucked anything massive), but I’m not super clear on what’s happening mathematically here to compensate the spectral contours.

Does that look about right?

a.harker · March 5, 2021, 6:48pm

It looks about right, yes.

Top vexpr avoids divide by zero/small numbers
Next one divides one set of band amplitudes by the other (that sorts the spectral contour)
Bit on the bottom right aims to normalise the loudness or amplitude (remove the overall amplitude change from the spectral compensation, which would otherwise try and match the amplitude of each band in the sample playback to that of the target, thus also changing the volume.

Is that clear?

rodrigo.constanzo · March 5, 2021, 7:00pm

Yeah. Well, much clearer at least.

When tidying up I realized that I never 100% understood what each step was doing (other than nominally).

rodrigo.constanzo · February 7, 2025, 5:38pm

Ok, a bit of a bump here (4 years!).

So following on from how effective creating a filter from spectral moments (as outlined in this thread), I’m having a hard time figuring how how to create a compensatory filter in a similar way as outlined in this thread.

As a bit of a reminder/summary of all the stuff above, the idea was to take a filter contour (from 40 melbands) for a source and a target, then using some maths (from @a.harker) to create a filter to make the source sound more like the target. Like so:

That’s a (jongly) kick as the source, and a (jongly) snare as the inputs, and the filter at the bottom is the filter that will make one sound like the other.

Here’s the same process with the sounds in reverse positions:

So what I’m trying to do now is a similar thing, but using spectral moments, which are in different units, ratios, ranges, etc…

Here are the same sounds but visualized with spectral moments:

If I run the same numbers through the code above, I get numbers that don’t make sense, which I don’t know if that’s because the numbers here aren’t normalized or because the numbers don’t necessarily translate in the same way since they aren’t individual bins with amplitudes being modulated.

So unpacking it a bit, looking only at centroid. If I want to have a sound with a centroid of 60 sound like a sound with a centroid of 102, do I need to apply a filter to it with a centroid of 142?

Similarly, if I have a sound with a skewness of 1.1 (tilting “right”) and want it to sound like a sound with a skewness of -0.8 (tilting “left”), do I apply a filter to it with a skewness of -1.1? ((-0.8 - 1.1) + -0.8).

Or does this need to be worked out per column/descriptor in some odd perceptual way?

OR is this not possible at all?

tremblap · February 11, 2025, 5:17pm

now, I took time to reply because I have a more philosophical question: if you want to sound like the source, why find something different? I know it might sound flippant, but spectrum compensation via a small tilt, or a small eq, is ok, and you have that already. Pitch and loudness matching too. I reckon the more you mess with the file the more the process’s sonic identity will take over?

rodrigo.constanzo · February 11, 2025, 5:45pm

When doing loads of different types of compensation (loudness, pitch, spectral, morphological), it does start getting to a point where the source is arbitrary, and I’m not terribly interested in pushing it that far.

BUT I have found spectral compensation, in the context of corpus query/matching to be very expressive as even with a small amount of samples/options, you can still apply some timbral variety from the audio input. I do this presently with melbands which works ok, but gets very phase fucky. And seeing how effective the spectralshape-derived filter sounds, I want to do that in a better way.

But back to the core philosophical question, I have been thinking quite a bit lately along similar lines to your oldschool LPT idea where you can have some hierarchical “macro” descriptor types that can each be made up of bespoke deep-learned descriptors, or dimensionally-reduced, or whatever, and then be able to do any combination of querying with and transferring over with them. So decoupling querying from transference.

A simple example would be using descriptors that don’t encode loudness (e.g. MFCCs sans 0-th, or spectral shape) to do your querying/regression/classification/whatever, then independent of that choosing to apply loudness-compensation to the end result.

A more complex example would be to query with pitch/loudness and morphology (but not timbre), then apply spectral/loudness compensation to the end results.

Can also open up possibilities for clever treatment of pitch/chroma (ala what you suggest in your surfing the waves paper) where you can query with pitch, find the best example, then compensate via chroma/octaves/modulo/detuning etc…

I sent you a quick phone vid of some of the recent stuff I’ve been working on with a buddy, but we’re getting really good results with morphology as a low-latency descriptor (i.e. using timescale regressors to predict longer analysis windows/morphology from short ones).

////////////////////////////////////////////////////////////////////

As to the original question, I don’t think it will be possible to just numerically map like I was thinking in my previous post as each compensation would have a huge aggregate effect likely skewing things way off.