Hey Flucoma Folks,
Sorry for the big post!
I’m currently in the process of stitching together a real-time granular concatenative (or mosaicking) system using Supercollider. The setup I’m working with involves an audio input from a contact microphone. Sound is from a self-built noise box that I play using various exciters.
I’ve had the pleasure of chatting with Ted and Rodrigo already, and their insights have been very helpful. Things are a bit more clear now. Big thanks to both of you!
While their advice has helped me to understand approaches to some of the challenges I’m facing, I’m still struggling a bit with the practical side of implementing these strategies in Supercollider.
If you guys have any tips, code, or resources that you think could help me make progress, I’d be extremely happy. There are different elements of this project that I need assistance with, all of which are somewhat interconnected.
Currently, I’m analyzing my corpus buffer like this:
// create two datasets:
// one of mfcc analyses for each slice and one of the playback information for each slice
(
var indices = Buffer(s);
var mfccs = Buffer(s);
var stats = Buffer(s);
var flat = Buffer(s);
var playback_info_dict = Dictionary.newFrom([
"cols",2,
"data",Dictionary.new;
]);
~ds_mfccs = FluidDataSet(s);
FluidBufOnsetSlice.processBlocking(s,~src,indices:indices,metric:9,threshold:0.005); //0.005
indices.loadToFloatArray(action:{
arg fa;
// go through each slice (from one slice point to the next)
fa.doAdjacentPairs{
arg start, end, i;
var num = end - start;
var id = "slice-%".format(i);
// add playback info for this slice to this dict
playback_info_dict["data"][id] = [start,num];
FluidBufMFCC.processBlocking(s,~src,start,num,startCoeff:1,features:mfccs, numCoeffs: ~numCoeff);
FluidBufStats.processBlocking(s,mfccs,stats:stats,select:[\mean]);
FluidBufFlatten.processBlocking(s,stats,destination:flat);
// add analysis info for this slice to this data set
~ds_mfccs.addPoint(id,flat);
"analyzing slice % / %".format(i+1,fa.size-1).postln;
//if((i%1000) == 999){s.sync;}
};
~ds_playback = FluidDataSet(s).load(playback_info_dict);
s.sync;
~ds_mfccs.print;
~ds_playback.print;
});
)
Then I am scaling (will probably use RobustScale) and populate & fit the KdTree
// Scale and populate a kdtree!
(
Routine{
~tree = FluidKDTree(s);
~tree.numNeighbours = 2;
~tree.radius_(8.0);
~scaled_dataset = FluidDataSet(s);
~scaler = FluidNormalize(s);
// ~scaler = FluidStandardize(s);
// ~scaler = FluidRobustScale(s);
s.sync;
~scaler.fitTransform(
~ds_mfccs,
~scaled_dataset,{
~tree.fit(
~scaled_dataset,{
"Kdtree fit!".postln;
});
});
}.play;
)
~scaled_dataset.print;
Since I’m dealing with live input and my corpus is already normalized, I realize that I need to also normalize the live input. This would ensure that it’s consistently scaled in relation to my corpus.
Based on my understanding, it seems like I might not need to lean on the real-time versions of descriptor units like FluidMFCC. I could possibly rely solely on their Buffer versions. My aim is to trigger the analysis and scaling/normalization by onset, using a rolling buffer, say about 1.0 seconds long (what timeframe would make sense?), and then perform the nearest neighbor search afterward. I suspect this approach would make it easier to stay consistent across corpus and matching data while being able to do the scaling on the actual live-input data. Would this approach be fast enough?
Part of my strategy is to then also experiment with various descriptor sets like a combination of SpectralShape, Pitch, and Loudness. If any of you have insights or advice on how such a configuration with a circular buffer might be structured code-wise in my grain playback SynthDef, I’d love to hear your thoughts. Or maybe you can suggest a better approach.
As of now, the analysis and playback portion looks like this:
(
SynthDef(\granSynth, {
.......
// Source signal and gating
.......
// Onset detection
trigOn = FluidOnsetSlice.ar(gatedSrc,metric:9,threshold: inOnsetThresh,minSliceLength:20,filterSize:7,frameDelta:0,windowSize:128);
// MFCC extraction
mfccs = FluidMFCC.kr(gatedSrc,startCoeff:1, numCoeffs: ~numCoeff);
mfccBuf = LocalBuf(mfccs.numChannels); // Input Buffer
playbackInfo = LocalBuf(4); // Output Buffer
// Trigger
trig = Select.kr(trigType, [Impulse.kr(trigRate), trigOn]);
// Store MFCCs into buffer
FluidKrToBuf.kr(mfccs, mfccBuf);
// kdtree finding the nearest neighbour in x dimensions -> omit the first coeff
~tree.kr(trig_on,mfccbuf,playbackinfo,1,~ds_playback);
# start, num = FluidBufToKr.kr(playbackinfo);
start.poll(label:"start frame");
num.poll(label:"num frames");
......
// Calculation for TGrains playback
....
// TGrains for playback
.....
}).add;
Next question would be:
- If I have a set of the following descriptors:
- FluidBufLoudness
FluidBufPitch
FluidBufSpectralShape
- FluidBufLoudness
- And I would want to weigh frames when computing the statistics with FluidBufStats perceptually with let’s say loudness. I assume I have to do the weighting on all other descriptors, so both Pitch and SpectralShape? What buffer from FluidBufLoudness do I pass to FluidBufStats? How would I set this up properly?
Finally, maybe you have some suggestions for overall improvement
Thanks a lot for reading and I’m looking forward to your answers!
Thanks,
Dominik