SC: real-time circular buffer analysis and scaling for concatenative synthesis

Dominik · May 22, 2023, 10:10am

Hey Flucoma Folks,

Sorry for the big post!
I’m currently in the process of stitching together a real-time granular concatenative (or mosaicking) system using Supercollider. The setup I’m working with involves an audio input from a contact microphone. Sound is from a self-built noise box that I play using various exciters.
I’ve had the pleasure of chatting with Ted and Rodrigo already, and their insights have been very helpful. Things are a bit more clear now. Big thanks to both of you!
While their advice has helped me to understand approaches to some of the challenges I’m facing, I’m still struggling a bit with the practical side of implementing these strategies in Supercollider.
If you guys have any tips, code, or resources that you think could help me make progress, I’d be extremely happy. There are different elements of this project that I need assistance with, all of which are somewhat interconnected.

Currently, I’m analyzing my corpus buffer like this:

  // create two datasets:
  // one of mfcc analyses for each slice and one of the playback information for each slice
  (
  var indices = Buffer(s);
  var mfccs = Buffer(s);
  var stats = Buffer(s);
  var flat = Buffer(s);
  var playback_info_dict = Dictionary.newFrom([
  	"cols",2,
  	"data",Dictionary.new;
  ]);
  
  ~ds_mfccs = FluidDataSet(s);
  
  FluidBufOnsetSlice.processBlocking(s,~src,indices:indices,metric:9,threshold:0.005); //0.005
  indices.loadToFloatArray(action:{
  	arg fa;
  
  	// go through each slice (from one slice point to the next)
  	fa.doAdjacentPairs{
  		arg start, end, i;
  		var num = end - start;
  		var id = "slice-%".format(i);
  
  		// add playback info for this slice to this dict
  		playback_info_dict["data"][id] = [start,num];
  
  		FluidBufMFCC.processBlocking(s,~src,start,num,startCoeff:1,features:mfccs, numCoeffs: ~numCoeff);
  		FluidBufStats.processBlocking(s,mfccs,stats:stats,select:[\mean]);
  		FluidBufFlatten.processBlocking(s,stats,destination:flat);
  
  		// add analysis info for this slice to this data set
  		~ds_mfccs.addPoint(id,flat);
  
  		"analyzing slice % / %".format(i+1,fa.size-1).postln;
  
  		//if((i%1000) == 999){s.sync;}
  	};
  
  	~ds_playback = FluidDataSet(s).load(playback_info_dict);
  	s.sync;
  	~ds_mfccs.print;
  	~ds_playback.print;
  });
  )

Then I am scaling (will probably use RobustScale) and populate & fit the KdTree

  // Scale and populate a kdtree!
  (
  Routine{
  	~tree = FluidKDTree(s);
  	~tree.numNeighbours = 2;
  	~tree.radius_(8.0);
  	~scaled_dataset = FluidDataSet(s);
  
  	~scaler = FluidNormalize(s);
  	// ~scaler = FluidStandardize(s);
  	// ~scaler = FluidRobustScale(s);
  
  	s.sync;
  	~scaler.fitTransform(
  		~ds_mfccs,
  		~scaled_dataset,{
  			~tree.fit(
  				~scaled_dataset,{
  					"Kdtree fit!".postln;
  			});
  	});
  }.play;
  )
  
  ~scaled_dataset.print;

Since I’m dealing with live input and my corpus is already normalized, I realize that I need to also normalize the live input. This would ensure that it’s consistently scaled in relation to my corpus.
Based on my understanding, it seems like I might not need to lean on the real-time versions of descriptor units like FluidMFCC. I could possibly rely solely on their Buffer versions. My aim is to trigger the analysis and scaling/normalization by onset, using a rolling buffer, say about 1.0 seconds long (what timeframe would make sense?), and then perform the nearest neighbor search afterward. I suspect this approach would make it easier to stay consistent across corpus and matching data while being able to do the scaling on the actual live-input data. Would this approach be fast enough?
Part of my strategy is to then also experiment with various descriptor sets like a combination of SpectralShape, Pitch, and Loudness. If any of you have insights or advice on how such a configuration with a circular buffer might be structured code-wise in my grain playback SynthDef, I’d love to hear your thoughts. Or maybe you can suggest a better approach.

As of now, the analysis and playback portion looks like this:

(
SynthDef(\granSynth, {
.......
// Source signal and gating
.......
	// Onset detection
	trigOn = FluidOnsetSlice.ar(gatedSrc,metric:9,threshold: inOnsetThresh,minSliceLength:20,filterSize:7,frameDelta:0,windowSize:128);

	// MFCC extraction
	mfccs = FluidMFCC.kr(gatedSrc,startCoeff:1, numCoeffs: ~numCoeff);
	mfccBuf = LocalBuf(mfccs.numChannels); // Input Buffer
	playbackInfo = LocalBuf(4); // Output Buffer

	// Trigger
	trig = Select.kr(trigType, [Impulse.kr(trigRate), trigOn]);

	// Store MFCCs into buffer
	FluidKrToBuf.kr(mfccs, mfccBuf);

	// kdtree finding the nearest neighbour in x dimensions -> omit the first coeff
	~tree.kr(trig_on,mfccbuf,playbackinfo,1,~ds_playback);
	# start, num = FluidBufToKr.kr(playbackinfo);

	start.poll(label:"start frame");
	num.poll(label:"num frames");
    ......
    // Calculation for TGrains playback
    ....
    // TGrains for playback
    .....
    }).add;

Next question would be:

If I have a set of the following descriptors:
- FluidBufLoudness
  FluidBufPitch
  FluidBufSpectralShape
And I would want to weigh frames when computing the statistics with FluidBufStats perceptually with let’s say loudness. I assume I have to do the weighting on all other descriptors, so both Pitch and SpectralShape? What buffer from FluidBufLoudness do I pass to FluidBufStats? How would I set this up properly?
Finally, maybe you have some suggestions for overall improvement
Thanks a lot for reading and I’m looking forward to your answers!

Thanks,
Dominik

tedmoore · May 23, 2023, 3:15pm

Hello again!

Lots of decently sized questions here. It might be more efficient for us to have another chat. Also, if you put together as much of the project as you can in a SC script that might be a good place to answer questions in-line.

Here are some thoughts. I’m pretty sure I’m not answering all your questions here, so please follow up:

Just for clarity, when you say ‘normalize’ that often refers to a specific type of scaling (normalization), but it sounds like you’re planning to use robust scaler so here it might be clearer to say ‘scale’ rather than ‘normalize’.

//==============

I just eyeballed the code a bit. It looks good. Is there a reason you’re setting a radius on the kdtree? when asking for just one neighbor, it makes sense to me to not set a radius threshold, and just get the one neighbor, but maybe you have a goal in mind here?

not necessarily, but you probably would want to for the same reason that you’re weighting the other descriptors.

This sounds like it will work well. maybe you can take a swing at it in the code so i can have a more specific idea to look at?

Dominik:

	// Store MFCCs into buffer
	FluidKrToBuf.kr(mfccs, mfccBuf);

	// kdtree finding the nearest neighbour in x dimensions -> omit the first coeff
	~tree.kr(trig_on,mfccbuf,playbackinfo,1,~ds_playback);

it looks like you’re not scaling the realt-time mfcc data at all?

Like I said, i’m pretty sure there are somethings i didnt address, maybe because i feel like i need more info, so keep the thoughts and questions coming!

t

Dominik · May 24, 2023, 1:34pm

Hey Ted!

Thank you for the reply! I will send you a mail with the SC script and the questions.

tremblap · May 24, 2023, 5:00pm

If I can point at this thread in which I have a real-time circular buffer to do queries of MFCCs that might help? Maybe my code is not clear thought so feel free to ask questions