Pseudo-Vocoder using FluidNMFFilter (SC)

mjsyts · July 23, 2023, 10:42pm

I modified the Bases and Activations code to make a kind of vocoder, but I suspect there is probably a better way to do this. I’m still just not feeling like I have a great grasp on the toolkit yet. This seems like a not great choice because I’m splitting the file into 50 components and then using that to filter the sound. I can probably have it focus more on the characteristic frequency bands for speech somehow? And maybe I’m also just going about this the wrong way.

// ====== bases and activations ========
Buffer.freeAll
~drums = Buffer.read(s, FluidFilesPath("Tremblay-AaS-VoiceQC-B2K-M.wav"), 0, -1, {|b| b.play});
// first, let's make two new buffers called...
~bases = Buffer(s);
~activations = Buffer(s);
~n_components = 50;

// and we'll explicitly pass these into the process
FluidBufNMF.processBlocking(s,~drums,bases:~bases,activations:~activations,components:~n_components,action:{"done".postln;});

// now we can plot them (yours may end up in a different order!):
~bases.plot;
// the bases are a like a spectral template that FluidBufNMF has found in the source buffer

~activations.plot;
// the activations are the corresponding loudness envelope of each base above. It will look like an amplitude
// envelope follower of the drum hits in the corresponding bases.

// ========= the activations could also be used as an envelope through time ===========

// we'll use 2 components here since we have just two speakers...
FluidBufNMF.processBlocking(s,~drums,bases:~bases,activations:~activations,components:2,action:{"done".postln;});

(
{
	var activation = PlayBuf.ar(2,~activations,BufRateScale.ir(~activations),doneAction:2);
	var sig = LFTri.ar([300,400],0,0.2) * activation;
	sig;.
}.play;
)

// note that the samplerate of the ~activations buffer is not a usual one...
~activations.sampleRate

(
{
	var activation = PlayBuf.ar(2,~activations,BufRateScale.ir(~activations),doneAction:2);
	var sig = WhiteNoise.ar(1);
	sig = FluidNMFFilter.ar(sig,~bases,2) * activation;
	sig.sum!2;
}.play;
)

tedmoore · July 24, 2023, 3:03pm

Hello @mjsyts,

My first thought is that 50 is probably too high of a number to get useful results. The number components should be thought of as distinct sound object components that exist in spectrum and time in the buffer. My guess is that the VoiceQC file doesn’t actually have 50 distinct spectral-temporal objects to decompose, so the algorithm will end up splitting them in ways that may seem arbitrary or confusing to a human listener.

Additionally, because of the fact that we’re doing the decomposition on a matrix of FFT magnitudes, 50 is maybe too high of a number regardless. (if there are 513 bins, that would suggest an average of ~51 bins per component–this is not really how it is working, but you kind of get the idea that I’m trying to suggest here?)

Also, I wonder if you’re thinking about the number of components as being analogous to the number of filters in a vocoder? This analogy doesn’t work out because asking for 50 components will return 50 complete spectral filters (full freq range), rather than a vocoder consisting of 50 bands or something like that.

For VoiceQC I might take a listen and see how many vowels + how many fricatives hear in there and try that as a number of components.

Let me know if this is helpful!

T

mjsyts · July 27, 2023, 6:21pm

Thanks Ted! I went back through the help file and made a new NMF by myself. It made a lot more sense then, plus your point about bins and components made a lot of sense. I was thinking of the components being horizontal as opposed to vertical, but I thought if I just added more of them, it would make a generalization about the relationships between each component but that doesn’t make sense.