I think other than the transient extraction one, most of the other things aren’t algorithms we have natively in flucoma. I guess the loudness one is a spectral gate of sorts, though it sounds really clean. And the pitch one has a few options, like the fundamental, even/odd harmonics, etc…
I wonder if they did the stuff in-house or if its licensed algorithms from izotope or something like that. I did see some speculation since the plugins are being sold separately and aren’t part of a .0 release.
Lastly, I love the UI of how the splits themselves are handled. Bitwig is definitely eating Ableon’s lunch in that regard. Live looks like Winamp in comparison.
All of these are pretty uninteresting to me personally except for the loudness splitter. My gut feeling is that it’s a spectral gate with a time threshold, so only stuff that exceeds a threshold for a certain time gets split. That’s obviously an incredible simplification of what is likely to be going on, but it definitely does feel better than just gating spectral bins by their magnitude or something…
I guess a big part that’s cool about it is how it’s built into the overall plumbing of the DAW, so you can route audio around in an intuitive/musical way, inside each of the devices.
To my ears, the algorithms sound pretty clean as well, though there are more sophisticated ears on this forum for that sort of thing.
I’ve not played properly with Bitwig (and even then on a very early version), but that is very cool how modules can split stuff up and then are available to be processed independently.
As for how to do odd / even splitting, never looked at it before, but this DaFX paper shows how it can be done with a pitch tracker and a fractional delay comb filter.
my idea was double - this comb filter idea occured to me after, but first I was going blunt and taking the fundamental and assigning fft bins to the nearest neighbourg - one could do a 'comb filter’y crossfade but I wanted to do assign
if I have time I’ll code it later but I’m stuck in other code now.
Doesn’t sound like that to me - it sounds like exactly what it says - look at spectral magnitudes and groups them into 3 groups via two thresholds into three bands. This is similar to GRM Tools Contrast:
The elements of making this kind of effect sound good (as in hi-fi and not obviously FFT-y, which Contrast does towards the end of the video) are:
1 - treat “regions of influence” and calculate peaks based on these (not individual bins) - see link for an explanation of that in a phase vocoder context
2 - suitable temporal smoothing of either detection inputs or soft mask outputs
Depending on your smoothing times number 1 can become less relevant.
There’s nothing in the Bitwig official demo that makes mean think it’s anything other than that. The other demo sounds cleaner, but that might partly be the synthetic nature of the input, meaning that the input components are already much more controlled.
Something like this would be fairly trivial to build in framelib. There are already demos of spectral dynamics using peak finding and regions of influence.
@a.harker ideas are good - another general thing I like to do which is less intelligent and maybe more sneaky is to remove transient material with a very short HPSS in front. That way I can remove the most obvious betrayer of fft processing, the lovely transient smearing…
this is an interesting topic.
slightly off topic: I tried a lot of fft-based tools for my job as a dialogue editor and because of their underlying technique they all more or less have this washed out fft sound. some found ways to preserve the clarity of the original sound more than others or ways to mask the artefacts . Not sure about it…
Anyway, I guess the key here is to split noise and transients from harmonic/unharmonic (voiced) components and treat them differently.
on pitchshifter do we need transients to be pitchshifted? What about noisy components?
on a timestretcher, do we need the transients to be stretched?
(this of course is a question of taste and what we are after.)
I also wonder if it would makes sense to combine fft and sample playback or using 2 different fft-sizes for different purposes: transients covered via sample playback
(alternatively fft with small fft-size) and the voiced components with fft size of 2048 or even higher.
sure, I can imagine that 2 fft streams (or a combi of fft and sample ) would be more expensive and also more difficult to handle to accomplish further fft processing down the line…
anyone here who experiment with approaches like this? Iam Curious to learn more about it…
This is what I was talking about above indeed: I use a short HPSS, biasing it towards being greedy of noise. then I leave that part alone and process the pitched component. It is far from perfect but it allows to select your flavour of artefacts A HPSS with a different FFT size, much shorter, like 256 64 is what I like but again our tools allow you to dynamically and in real-time pick what sounds best with the material you are working on.
With much help from @a.harker and @weefuzzy I’ve managed to glue together a series of objects that do something like the loudness splitter. I learned a lot about how to create soft masks and the wonders of fl.peaks~ for implementing the “regions of infuence” thing mentioned in the vocoder paper. I actually prefer it in a dual split than a tri split for now but I think the main issue is making a good interface around this that lets you set better thresholds.
The latency is a function of the hop size like all spectral things. You could lower it and change the one place in the patch where there is a hard coded constant (framesize / 2) + 1: