Adding and Appending Points to FluidDataSet

areacode · March 31, 2025, 5:35am

Hi all -

Returning to FluComa again after a few months away (I don’t recommend taking a break!) - but I wanted to revisit something I didn’t quite figure out last time around.
I’m looking at the example code for the fifth iteration of the “2D Explorer” - and I’m hoping to run some sort of function to add a new set of points to the plot.
We talked a bit about it on this thread, a few months ago - but I’m still a little bit unclear.

Where would I intervene in this code to analyze and add another set of samples?

I guess the first thing is:
~src = ~loader.buffer;
This would be overwriting the previous buffer - but if I add the incoming samples to the existing ~src, that means all of the samples would be getting re-analyzed and re-processed. This could accumulate to be quite a delay after awhile.

I’m not entirely sure how to deal with this first hurdle - but they compound as I follow the line of thought across the provided code. FluCoMa seems so well-documented, I tend to assume I just missed something - is this something that is solved efficiently elsewhere? Thanks for any insight.

areacode · April 5, 2025, 8:24am

I’m thinking maybe I didn’t explain this well enough and so I’ve been spending time with the problem. I think I’ve distilled it to a simpler question.

With the following block of code, we get a buffer named ~indices that gives all of the slice points for an input buffer called ~src.
If I would like to take information from a second input buffer and add it to the end of ~indices, without re-analyzing ~src, how would I go about that?
Thanks again!

(
~indices = Buffer(s);
FluidBufOnsetSlice.processBlocking(s,~src,metric:9,threshold:0.05,indices:~indices,action:{
	"found % slice points".format(~indices.numFrames).postln;
	"average duration per slice: %".format(~src.duration / (~indices.numFrames+1)).postln;
});
)

weefuzzy · April 6, 2025, 10:40am

Hi,

I’m unclear about whether you want to work with Buffers here, or FluidDataSet, because they’re quite distinct things.

If you want to accumulate indices from repeated runs of a slicer into a Buffer, then you could use FluidBufCompose to just keep building up the list as you go. Or (perhaps more efficient, if it fits your workflow), accumulate everything into a language side array and then make a buffer at the end.

If you want to aggregate results into a FluidDataSet, then it’s a matter of calling addPoint for each new point you want to add. Perhaps it’s a bit confusing for these purposes that addPoint uses a Buffer for its ‘point’ data: here we’re just using the Buffer as a server side array. If you wanted to accumulate an array of indices this way, then you’d need to read them off one-by-one into a single-sample buffer to act as the point. Alternatively (because that’s a drag) you could work with a single big Buffer as above and then convert that whole thing to a FliudDataSet at the end with fromBuffer.

areacode · April 6, 2025, 6:42pm

Thanks @weefuzzy -

I’m a little foggy here, still, so please excuse my slowness. I promise I’m working my hardest to understand this idea.

My understanding is that the example (the 2D Corpus Explorer) uses a single buffer with all of the corpus loaded into it - and it produces a FluidDataSet with corresponding analyzed data that matches up with the main buffer. I want to add new sounds, analyze them, and then add the new plot points - over the course of a performance - which seems like a straight-forward thing, but I’m somehow making it complicated by my misunderstanding. I’m definitely having a little trouble understanding when to merge datasets, when to combine buffers, and when to add points to a dataset.

I’ve started to try and break down the building blocks a little, hopefully for the sake of clarity.

The first step is to simply load up a buffer with a folder and make sure all the files are converted to mono. This will resolve the most recent folder to “currentBuf.” I’ll also keep track of a “mainBuf”, because I will be collecting different folders as this process continues. So far so good.

The next step is the first round of slicing. If I slice “currentBuf”, it returns a set of indices in a buffer. This works as expected. The third step is analyses these indices and organizes them into a FluidDataSet. This also works as expected.

Now, I’m imagining at the point of merging this all together - I can add each iteration of FluidDataSet (collected as ~mainDataSet) and pair them with the on-going “mainBuf”. But I’m not sure how this would work or if this is the way you are describing. Am I on the right track here or is this way off from what you are saying?

// first step: load buf

~load = {|folder|

var buf, loader;
buf = Buffer(s);
loader = FluidLoadFolder(folder).play(s,{"done loading folder".postln;

if(loader.buffer.numChannels > 1){

	loader.buffer.numChannels.do{
		arg chan_i;
		FluidBufCompose.processBlocking(s,
			loader.buffer,
			startChan:chan_i,
			numChans:1,
			gain:loader.buffer.numChannels.reciprocal,
			destination:buf,
			destGain:1,
			action:{"copied channel: %".format(chan_i).postln}
		);
	};
}{
	"loader buffer is already mono".postln;
	buf = loader.buffer;
};

	});
	buf;
};


//second step: slice
~onsetSlicer = {|mainBuf|
	var indices = Buffer(s);
	FluidBufOnsetSlice.processBlocking(s,mainBuf,
		metric:9,threshold:0.05,
		indices:indices,action:{
	"found % slice points".format(indices.numFrames).postln;
	"average duration per slice: %".format(mainBuf.duration (indices.numFrames+1)).postln;
});
	indices;
};


//third step: analysis 
~analysis = {|slices, currentBuf|
var analyses = FluidDataSet(s);

slices.loadToFloatArray(action:{
	arg fa;
	var mfccs = Buffer(s);
	var stats = Buffer(s);
	var flat = Buffer(s);

	fa.doAdjacentPairs{
		arg start, end, i;
		var num = end - start;

		FluidBufMFCC.processBlocking(s,currentBuf,start,num,features:mfccs,numCoeffs:13,startCoeff:1);
		FluidBufStats.processBlocking(s,mfccs,stats:stats,select:[\mean]);
		FluidBufFlatten.processBlocking(s,stats,destination:flat);

		analyses.addPoint(i,flat);

		"analyzing slice % / %".format(i+1,fa.size-1).postln;

		if((i%100) == 99){s.sync;}
	};
	s.sync;
	analyses.print;
});
	analyses;
};


~mainBuf = [];
~currentBuf = ~load.(FluidFilesPath());
~mainBuf = ~mainBuf.add(~currentBuf);
//~currentBuf -> Buffer(0, 22268421, 1, 48000.0, nil)

~offsets = ~onsetSlicer.(~currentBuf);
//~offsets -> Buffer(2, 1478, 1, 48000.0, nil)

~mainDataSet = [];
~dataset = ~analysis.(~offsets, ~currentBuf);
~mainDataSet.add(~dataset);

//~dataset.size = 1477.

tedmoore · April 6, 2025, 11:25pm

What I would recommend is to use two datasets.

The first dataset contains the audio analyses of your slices.

The second dataset has the same IDs as the first so you know which slice corresponds to each point in both datasets. So for each ID in this second dataset there are three values:

the buffer that the slice came from
the starting sample of the slice
the number of samples in the slice

This way, what you can do is keep making new buffers of as many different sources as you want, slicing those buffers, doing the analysis and using .addPoint to store the analyses for that slice (dataset 1) and how to access that slice (dataset 2).

trying to keep everything in one buffer or one buffer of slice points seems too complicated.

//=================

another idea is, instead of using the dataset 2, this information could be stored in a 3 channel buffer which might make accessing it on the server easier.

areacode · April 7, 2025, 5:14am

Well, this definitely makes a lot more sense conceptually- but I’m still fairly in the dark about how to execute it properly.

I can show my thought process - but it doesn’t take long to get off track… Is this stuff documented anywhere that I’m missing? I think I’m particularly stuck on merging datasets and ID protocol.
Thanks.

//establish two data sets.
~dataset1 = FluidDataSet(s);    //store all incoming analysis with ID.
~dataset2 = FluidDataSet(s);   //[ID, bufferNumber, startSample, number of samples];

//
~b = Buffer.read(s, FluidFilesPath("Nicol-LoopE-M.wav"));  //load Buffer
~sliced = ~onsetSlicer.(~b);   //slice Buffer..42 points.
~dataset1 = ~analyze.(~sliced, ~b);  //analyze Buffer. 41 rows (i forget why we lose one).

//i'm unclear here on how to add the relevant information to ~dataset2. 

//onset function and analysis function
~onsetSlicer = {|mainBuf|
	var indices = Buffer(s);
	FluidBufOnsetSlice.processBlocking(s,mainBuf,
		metric:9,threshold:0.05,
		indices:indices,action:{
	"found % slice points".format(indices.numFrames).postln;
	"average duration per slice: %".format(mainBuf.duration (indices.numFrames+1)).postln;
});
	indices;
};

~analyze ={|slices, currentBuf|
var analyses = FluidDataSet(s);

slices.loadToFloatArray(action:{
	arg fa;
	var mfccs = Buffer(s);
	var stats = Buffer(s);
	var flat = Buffer(s);

	fa.doAdjacentPairs{
		arg start, end, i;
		var num = end - start;

		FluidBufMFCC.processBlocking(s,currentBuf,start,num,features:mfccs,numCoeffs:13,startCoeff:1);
		FluidBufStats.processBlocking(s,mfccs,stats:stats,select:[\mean]);
		FluidBufFlatten.processBlocking(s,stats,destination:flat);

		analyses.addPoint(i,flat);

		"analyzing slice % / %".format(i+1,fa.size-1).postln;

		if((i%100) == 99){s.sync;}
	};
	s.sync;
	analyses.print;
});
	analyses;
};

weefuzzy · April 7, 2025, 5:00pm

Completely untested, but does this make sense? Instead of the analyze function returning a fresh dataset each time, it now takes in both the analysis and index datasets and updates them.

//establish two data sets.
~dataset1 = FluidDataSet(s);    //store all incoming analysis with ID.
~dataset2 = FluidDataSet(s);   //[ID, bufferNumber, startSample, number of samples];

// Change the call to analyse to take both datasets and update them with a new slice buffer
~b = Buffer.read(s, FluidFilesPath("Nicol-LoopE-M.wav"));  //load Buffer
~sliced = ~onsetSlicer.(~b);   //slice Buffer..42 points.
~analyze.(~sliced, ~b, ~dataset1, ~dataset2);  //analyze Buffer. 41 rows (i forget why we lose one).

// for subsequent files, process should be the same 
~b = Buffer.read(s, FluidFilesPath("SOME OTHER FILE"));  //load Buffer
~sliced = ~onsetSlicer.(~b);   
~analyze.(~sliced, ~b, ~dataset1, ~dataset2);  /


//onset function and analysis function
~onsetSlicer = {|mainBuf|
	var indices = Buffer(s);
	FluidBufOnsetSlice.processBlocking(s,mainBuf,
		metric:9,threshold:0.05,
		indices:indices,action:{
			"found % slice points".format(indices.numFrames).postln;
			"average duration per slice: %".format(mainBuf.duration (indices.numFrames+1)).postln;
	});
	indices;
};

~analyze ={|slices, currentBuf, analysis_dataset, index_dataset|
	slices.loadToFloatArray(action:{
		arg fa;
		var mfccs = Buffer(s);
		var stats = Buffer(s);
		var flat = Buffer(s);
		var slice_info = Buffer.alloc(s,3); 
		var offset = analysis_dataset.size(); 
		s.sync; 
		fa.doAdjacentPairs{
			arg start, end, i;
			var num = end - start;
			
			FluidBufMFCC.processBlocking(s,currentBuf,start,num,features:mfccs,numCoeffs:13,startCoeff:1);
			FluidBufStats.processBlocking(s,mfccs,stats:stats,select:[\mean]);
			FluidBufFlatten.processBlocking(s,stats,destination:flat);
			slice_info.set(0,slices.bufnum, 1, start, 2, num); 
			index_dataset.add(i + offset, slice_info); 
			analysis_dataset.addPoint(i + offset,flat);

			"analyzing slice % / %".format(i+1,fa.size-1).postln;
			
			if((i%100) == 99){s.sync;}
		};
		s.sync;
		analysis_dataset.print;
	});
};

areacode · April 7, 2025, 6:25pm

Hey there -

Yes - thank you - this seems like a good approach, but I’ll need to fiddle with it a bit to understand everything.

One quick error that I’m ironing out… the following doesn’t like the “+” (binary operator ‘+’ failed) and I think the index_dataset is maybe supposed to be addPoint?

index_dataset.add(i + offset, slice_info); 
analysis_dataset.addPoint(i + offset,flat);

to:

index_dataset.addPoint(i,  slice_info); 
analysis_dataset.addPoint(i, flat);

This runs - but I don’t think it’s doing what is intended…

weefuzzy · April 7, 2025, 8:46pm

Yeah, addPoint sorry.

You need the + offset in there so that the IDs of the points in the dataset are distinct between calls to the function. So if it’s not working, we should figure out why: offset is supposed to be set to the size of the dataset at the top of the function, but maybe I got the call wrong.

The way it’s meant to work is, let’s say on the first run you got three slices, then you should have IDs 0,1,2 in the datasets. Then if you get three more slices next time, it should pick up where it left off and give you 3,4,5

areacode · April 7, 2025, 9:38pm

Ah, I think it has something to do with how FluidDataSet returns values. This runs, but produces " Wrong Point Size" errors for each extra slice.

var offset;
analysis_dataset.size({|o| offset = o});

weefuzzy · April 7, 2025, 10:29pm

Yup, completely forgot that – not being a regular SC user, the whole server thing escapes my mind every time.

Had a fiddle about now, this seems to do the thing

~analyze ={|slices, currentBuf, analysis_dataset, index_dataset|
	slices.loadToFloatArray(action:{
		arg fa;
		var mfccs = Buffer(s);
		var stats = Buffer(s);
		var flat = Buffer(s);
		var offset = 0; 
		analysis_dataset.size({|o| offset = o;}); 
		s.sync; 
		fa.doAdjacentPairs{
			arg start, end, i;
			var num = end - start;
			var slice_info = Buffer.sendCollection(s,[slices.bufnum, start, num]);
			FluidBufMFCC.processBlocking(s,currentBuf,start,num,features:mfccs,numCoeffs:13,startCoeff:1);
			FluidBufStats.processBlocking(s,mfccs,stats:stats,select:[\mean]);
			FluidBufFlatten.processBlocking(s,stats,destination:flat);

			index_dataset.addPoint(i + offset, slice_info); 
			analysis_dataset.addPoint(i + offset,flat);
			slice_info.free; 
		};
		s.sync;
		mfccs.free; 
		stats.free; 
		flat.free; 
	});
};

There’s a perennial frustration involved with populating buffers in a loop like this, because on the server the buffer filling magic isn’t on the non-realtime thread, so it’s hard to preserve ordering without doing what I do here and making / freeing a buffer each time round the loop.

areacode · April 8, 2025, 3:26am

Yes, thank you so much for helping with this - it seems like it would be very straight-ahead in the Max/MSP version, but I can see that it needs some extra consideration in SC. I really appreciate all of the effort in helping me get this far, I’m sure this will be useful to others also - but I do want to clarify the next steps of this process, so all this work isn’t in vain:

My assumption is that I should umap, normalize, and fit ~dataset1 to the KDTree - but I’ll need to re-write “~play_slice” a bit so that it can get the relevant information from ~dataset2. I will need the IDs for the variables marked ~indices and the buffer numbers for the variables marked ~src. Is this accurate? If so, what’s the best way to access those values?

weefuzzy · April 8, 2025, 8:04am

That sounds about right.

The IDs in ~dataset2 should line up with those in ~dataset1, so they should be the same ones that come out of the plotter when you move over the blobs. So then you can retrieve the info using ~dataset2.getPoint. You can then adjust your synthdef to look in whatever buffer you send to this getPoint call for the bufnum and playback segment.

areacode · April 9, 2025, 4:31am

I think there’s a slight issue again with the non-realtime here…
The following returns a blank array:

(
~play_slice = {
	arg index, dataset;
	{
		var lookup = Buffer(s);
		var valueArray = [];
		dataset.getPoint(index, lookup); 
		lookup.loadToFloatArray(action:{|fa| valueArray = fa});
		index.postln;
		valueArray.postln;
}.play;
})

Where as, making valueArray a global variable seems to receive values…but the playback doesn’t seem correct.

(
~play_slice2 = {
	arg index, dataset;
	{
		var lookup = Buffer(s);
		dataset.getPoint(index, lookup); 
		lookup.loadToFloatArray(action:{|fa| ~valueArray = fa});
		index.postln;
		~valueArray.postln;
}.play;
})

I’d ultimately like to write this in as concise a way as possible, and I’m wondering if it wouldn’t be better to get the array in FluidPlotter and pass it to “play_slice” anyway… but I found this didn’t work properly either:

(
~tree = FluidKDTree(s).fit(~normed);
~normed.dump({
	arg dict;
	var point = Buffer.alloc(s,2);
	var point2 = Buffer(s);
	var previous = nil;
	var valueArray = [];
	defer{
		FluidPlotter(dict:dict,mouseMoveAction:{
			arg view, x, y;
			//[x,y].postln;
			point.setn(0,[x,y]);
			~tree.kNearest(point,1,{
				arg nearest;
				if(nearest != previous){
					//nearest.postln;
					view.highlight_(nearest);
					~dataset2.getPoint(nearest.asInteger, point2,
					point2.loadToFloatArray(action: {|fa| valueArray = fa;}));
					~play_slice.(valueArray[0], valueArray[1], valueArray[2].asInteger);
					previous = nearest;
				}
			});
		});
	}
});
)

Any idea what I’m misunderstanding here?

tremblap · April 9, 2025, 6:32am

I cannot test right now, but in the first part you have valueArray posting outside of the action function. I don’t know if you can be certain it has been updated…

weefuzzy · April 9, 2025, 5:02pm

I suspect what’s happening is that getPoint hasn’t completed before you moved on to trying to get the buffer contents language-side. You could use the action callback for getpoint to only load to array once it’s done.

However, if you’re just going to use the contents for playback anyway, seems to me you could avoid the round trip back to the language by just passing your synth the bufnum for your point and having it access the buffer directly on the server.

tedmoore · April 9, 2025, 5:39pm

areacode:

(
~play_slice = {
	arg index, dataset;
	var lookup = Buffer(s);
	var valueArray = [];
	dataset.getPoint(index, lookup); 
	lookup.loadToFloatArray(action:{|fa| 
		valueArray = fa
		index.postln;
		valueArray.postln;
	});
})

I’m pretty sure .loadToFloatArray goes in the OSC queue (as does .getPoint) so this would be the way to get these values out of the lookup Buffer. I also removed a strange “function inside a function” thing you had going on there.

//===========================================================

This is a good suggestion. It would also be cool to know when the playback info in the buffer has changed.


(
s.options.sampleRate = 44100;
s.waitForBoot{
	~ds_analysis_vectors = FluidDataSet(s);
	~analysis_buf = Buffer.alloc(s,13);
	~playback_info_buf = Buffer.alloc(s,3);
	~playback_info = Dictionary.newFrom([
		"cols",3,
		"data",Dictionary.new
	]);

	5.do{
		arg i;
		var id = "point-%".format(i);
		~analysis_buf.setn(0,{rrand(-130.0,130)} ! 13); // dummy data, would actually come from server
		s.sync;
		~ds_analysis_vectors.addPoint(id,~analysis_buf);
		~playback_info["data"][id] = [i%3,rrand(0,44100),rrand(44100,8820)];
	};

	s.sync;

	~ds_analysis_vectors.print;
	~ds_playback_info = FluidDataSet(s).load(~playback_info).print;

	s.sync;

	{
		arg playbackbuf;
		var playback_info;
		var buf_id, start, num;
		var trig;

		playback_info = FluidBufToKr.kr(playbackbuf,numFrames:3);
		trig = Changed.kr(playback_info).sum > 0;

		# buf_id, start, num = playback_info; // use these to play back slice!

		SendReply.kr(trig,"/buffer_updated_reply",[buf_id, start, num]);

		nil;
	}.play(args:[\playbackbuf,~playback_info_buf]);

	OSCdef(\buffer_updated_reply,{
		arg msg;
		msg.postln;
	},"/buffer_updated_reply");

	~play_slice = {
		arg id;
		~ds_playback_info.getPoint(id,~playback_info_buf);
	};
};
)

~play_slice.("point-0");
~play_slice.("point-1");
~play_slice.("point-2");
~play_slice.("point-3");
~play_slice.("point-4");

areacode · April 9, 2025, 6:38pm

Alright - I’m posting the full code below, since I feel like maybe there is a larger issue at hand.

It’s definitely true that the getPoint needs some kind of callback - but I’m not sure how that works. I might be missing where it is in the documentation… I’m attempting to pass the bufNum, start, and stop points in the FluidPlotter now - but it requires the same situation with reading from the FluidDataSet.getPoint…

I’ll look closer at @tedmoore’s example also, since I think it will take me a little longer to understand the separate OSC calls - but if I’m understanding correctly, that’s the only way to do this?

Thanks for all your patience.

//establish two data sets.
~dataset1 = FluidDataSet(s);    //store all incoming analysis with ID.
~dataset2 = FluidDataSet(s);   //[ID, bufferNumber, startSample, number of samples];


// Change the call to analyse to take both datasets and update them with a new slice buffer
~b = Buffer.read(s, FluidFilesPath("Nicol-LoopE-M.wav"));  //load Buffer
~sliced = ~onsetSlicer.(~b);
~analyze.(~sliced, ~b, ~dataset1, ~dataset2);
// FluidFilesPath();
// for subsequent files, process should be the same
~b = Buffer.read(s, FluidFilesPath("Green-Box639.wav"));  //load Buffer
~sliced = ~onsetSlicer.(~b);
~analyze.(~sliced, ~b, ~dataset1, ~dataset2);

//onset function and analysis function
~onsetSlicer = {|mainBuf|
	var indices = Buffer(s);
	FluidBufOnsetSlice.processBlocking(s,mainBuf,
		metric:9,threshold:0.05,
		indices:indices,action:{
			"found % slice points".format(indices.numFrames).postln;
			"average duration per slice: %".format(mainBuf.duration (indices.numFrames+1)).postln;
	});
	indices;
};
~analyze ={|slices, currentBuf, analysis_dataset, index_dataset|
	slices.loadToFloatArray(action:{
		arg fa;
		var mfccs = Buffer(s);
		var stats = Buffer(s);
		var flat = Buffer(s);
		var offset = 0;
		analysis_dataset.size({|o| offset = o;});
		s.sync;
		fa.doAdjacentPairs{
			arg start, end, i;
			var num = end - start;
			var slice_info = Buffer.sendCollection(s, [slices.bufnum, start, num]);
			FluidBufMFCC.processBlocking(s,currentBuf,start,num,features:mfccs,numCoeffs:13,startCoeff:1);
			FluidBufStats.processBlocking(s,mfccs,stats:stats,select:[\mean]);
			FluidBufFlatten.processBlocking(s,stats,destination:flat);
			index_dataset.addPoint(i + offset, slice_info);
			analysis_dataset.addPoint(i + offset,flat);
			slice_info.free;
		};
		s.sync;
		mfccs.free;
		stats.free;
		flat.free;
	});
};



(
~umapped = FluidDataSet(s);
FluidUMAP(s,numNeighbours:15,minDist:0.9).fitTransform(~dataset1,~umapped,action:{"umap done".postln});
)


(
~normed = FluidDataSet(s);
FluidNormalize(s).fitTransform(~umapped,~normed);
)



(
~tree = FluidKDTree(s).fit(~normed);
~normed.dump({
	arg dict;
	var point = Buffer.alloc(s,2);
	var point2 = Buffer(s);
	var valueArray = [];
	var previous = nil;
	defer{
		FluidPlotter(dict:dict,mouseMoveAction:{
			arg view, x, y;

			//[x,y].postln;
			point.setn(0,[x,y]);
			~tree.kNearest(point,1,{
				arg nearest;
				if(nearest != previous){
					~dataset2.getPoint(nearest, point2, {|fa|valueArray = fa;});
					//valueArray.postln;
					view.highlight_(nearest);
					~play_slice.(valueArray[0], valueArray[1], valueArray[2]);
					previous = nearest;
				}
			});
		});
	}
});
)


(
~play_slice = {
	arg bufNum, start, stop;
	{
		var phs = Phasor.ar(0,BufRateScale.ir(bufNum),start,stop);
		var sig = BufRd.ar(1,bufNum,phs);
		var dursecs = (stop - start) / BufSampleRate.ir(bufNum);
		var env;

		dursecs = min(dursecs,1);

		env = EnvGen.kr(Env([0,1,1,0],[0.03,dursecs-0.06,0.03]),doneAction:2);
		sig.dup * env;
	}.play;
};
)

tedmoore · April 9, 2025, 6:53pm

I haven’t looked at your code yet, but I’ll just say with SuperCollider (and therefore FluCoMa in SuperCollider) there’s always multiple ways to do things! I’ll try to take a closer look in the future.

tedmoore · April 9, 2025, 9:35pm

I just eyeballed it a bit, what you have looks good. Is it doing what you want?

The things I was talking about in the previous post are just another option of how to do it. What you have looks like a good solution!