Something I’ve used in C-C-Combine (based on @a.harker’s suggestion) is loudness and pitch compensation where I compensate for the discrepancy in loudness/pitch from a matched sample/grain to what I am matching it against. For loudness and pitch this is fairly straight forward in that I can boost/cut the amplitude and play it back faster/slower to more-or-less “accurately” compensate the sample.
This works really well, particularly in terms of pitch compensation.
Now I’m wondering how viable it might be to do something similar but for spectral compensation. Since fluid.spectralshape~ and fluid.bufspectralshape~ output all the spectral moments, and combined, they describe a more comprehensive spectral shape, which could potentially be translated into a subtle/vague filter shape (with filterdesign~?).
Obviously it can’t be as absolute as loudness/pitch compensation, since there is more risk of overly/incorrectly compensating the match.
Have any of you experimented with this?
Is it viable enough to do on a per-grain basis?
Hmm, that could be another way, though that could potentially make things worse since I would want an IR of the difference between the two samples anyways? (@tremblap / @a.harker?)
Also, I don’t think it’s possible to do that in a “real-time” manner aways, since you’d have to load the IR into a buffer~/multiconvolve~, and that takes time. @tremblap has a patch for something similarish that solves the problem by the IR always being one attack late, so whenever an onset happens, it cues up an IR…which is what the next onset will use.
Potentially make things worse and also just not be the kind of sound result you want - I’m spitballing ideas here. I’m thinking of a more cross-dog implementation with a mixer/balance control to let you latch onto some aspect of the sound that you want.
My initial thought was in a concat resynthesis context, where the grains come by so fast that you don’t “hear” individual grains, and therefore and spectral massaging they get would just get washed into the overall texture. So in that context just nudging it a bit could maybe help with getting a more accurate spectral contour as well.
I’ve not tried to do this specifically, but thoughts:
Spectral shape descriptors might not be what you want: they (sometimes) tell you enough to distinguish one sound from another, but because the higher moments (skew etc) are calculated with respect to the centroid, trying to interpret them as describing a spectrum would give you a funny shape.
Modulating filters per grain can be done; biquads will tend to blow up if you wiggle them too fast, SVF are more stable for this sort of thing. Check out @a.harker’s examples in FrameLib
Likewise, real-time, dynamic convolution can be done with FrameLib, but possibly isn’t what you’re after here, like you say.
What you want, if I understand it, might be more like spectral envelope replacement (a bit) like a vocoder? Again, if I was going to try and do this per-grain in Max, I’d reach for FrameLib, and use an estimate of the spectral envelope of my target frames to stamp on the incoming ones.
At this point it’s just a hunch or idea to test, as part of the fun of the concat stuff is the “lossy-ness” of it, but could be nice to have the nuance if desired.
I still haven’t worked my way through @jamesbradbury’s tutorials, so I’m not sure how to actually go about doing something like that in FrameLib, but increasingly the solution to most of my problems seems to be “you can do it in FrameLib”, so it might be time to do that…
I’d heartily recommend it – I’ve been loving the per-grain control fun. I might have some code knocking about where I’ve done spectral envelope estimation in FrameLib, so give me a shout when you’re at that point and I’ll dig about.
Hello - I’ve been thinking about this for a while.
I’d advise making a small IR by deconvolution of the desired envelope by the one you are using and then smoothing/simplifying the result.
The issue is how to get the envelope of the sounds involved. One option is to simplify the spectrum with irpla~ or something like that (other options are cepstral techniques or LPC). For real-time input the problem is that you don’t have the filter/sound until it is happened, so that is an issue.
I don’t advise wiggly filter things - I think what you want here is one filter per grain. Framelib would make it more viable to do this sort of thing with sample accuracy, but I’m not sure it is accurate to say that loading an IR into multiconvolve~ “takes time”, except in the sense that all calculations/operations take time. In respect to the max thread you are running in it is immediate. That is not immediate in terms of the audio thread, but that’s another matter.
I’d be up for making a synthetic spectrum from the moments also to try and approximate the shape - that would be a fun task inline with my current interests. The problem here is that we have to decide on the level of complexity/some characteristics of the synthetic filter shape - for centroid and spread that seems fairly doable, but the higher order ones need more thought.
So one intended implementation of this idea would be to include it in C-C-Combine, where for any given analysis frame there would be an input frame (already analyzed) and it’s nearest match from the database (then using lookup to get the difference between the two)).
So in that case there would be a known frame for both.
If I understand you right, would that then entail something along the lines of using irtrimnorm~ (or fluid.bufcompose~) to get the audio from the incoming audio buffer (using a JIT approach in C-C-Combine) into its own buffer~, and then running irplapprox~ on it to get an IR which I can then apply to the playback grain?
I meant more in the sense that it may involve buffer~ operations before/after it, which I guess have the same issues as the threading/overdrive stuff from this thread.
That would be quite cool.
I tried whipping something up using biquad~ and taking the raw readings and going all wiggle-waggle on noise~ to see if it works, and I guess I could kind of hear what was happening, but obviously this is not a great way to go about it.
Unless it’s crazy complicated, I’d be curious to see it either way if it’s handy.
OK. Here is my first patch of this thread. This is using the HIRT and is not real-time friendly - this is just a super-simple make the overall spectrum of this sound sound more like the overall spectrum of that sound patch. The key is in the cropping (which is in essence a kind of smoothing in this context), and also the actual smoothing which controls the detail of the filter.
This would not quite yet be possible in FrameLib. There isn’t any smoothing like this in FrameLib yet, and the minimum phase conversion could probably be written in FrameLib, but it’s not simple if you’re not familiar with the technique.
Here’s a more complicated version that allows you to approximate the shape of each spectrum first with piecewise linear approximation. Probably not worth the extra hassle of calculation because the results are largely similar, and when you reduce the number of segments right down you’ll start to get big errors in a less controllable manner that with smoothing alone. However, presented here for reference.
1 - it should be noted that the process will correct the spectrum including the overall amplitude so this should sound roughy the same level.
2 - the question now is how much like the spectrum of one sound you want to get from the other. Is the smoothing adequate control over accuracy? Probably the next stage would be to look at much more approximate parameterised models, but that gets a bit complex. Looks like the Pearson distribution would allow control over the moments of a synthetic shape, but the maths is a little complex. I can take a look.
3 - whatever method you use it is basically come down to:
A - approximate both spectra
B - divide one (the target) by the other (the match)
C - apply the filter to the match
Other methods would include AR modelingl and a IIR filter as a result, or LPC/other approximation techniques.
@weefuzzy can we get your spectral envelope stuff - I might try comparing a few methods.
Here’s a vocodery thing using the cepstrum to estimate spectral envelope with framelib. It’s doing things @rodrigo.constanzo probably doesn’t want, like whitening (by dividing the carrier by its own spectral envelope). To apply a sort of summary spectrum to a carrier, one could do a one component NMF on a target and apply the smoothed envelope of that.
Thanks @weefuzzy. I’m confused by the greater than / less than in the spectral envelope. It looks like you are expecting a mirrored full spectrum in, but your fat input is real only (so it doesn’t have a mirrored spectrum).
Possibly because you are doing the iFFT of a real only signal you get a mirrored result and so your cepstrum is mirrored, is that it? Can you clarify?
Yes, that’s it. The > and < are doing the liftering on the cepstrum (i.e. zeroing coefficients), and because it’s symmetrical, I need to treat it accordingly. If, in that sub-patch, you probe down the left hand side with a fl.tomax~ –> multislider you get a pretty good visual idea of what shapes the smoothing produces.
I am recoding for my own edification with a slightly different logic (I create an index vector that is symmetrical and lifter based on a < operation on that.
In doing so it looks to me like you don’t correctly maintain symmetry in your code. I think when you do the length minus the threshold “bin” you end up out by one. A mirrored spectrum has a nyquist bin that appears only once (and thus the symmetry is not central in the vector), but here we expect an exact mirror with a repeated central value. If I examine the vector produced by your code I get one more “bin” of values on the left hand side compared to the right.