Exploring sound corpora in SC

jan · March 15, 2022, 8:46pm

Hello all,

after getting acquainted and having decomposed and sliced sound into buffers/files, what would be a recommendable next step to create corpora and explore patterns and/or (spectral) relations within their elements in SC ?

Thanks,
Jan

tedmoore · March 15, 2022, 8:52pm

Hi @jan,

I would recommend checking out this sequence of files. Some of it will already be familiar to you, and some may be new. Let me know if this is the kind of thing you’re looking for and what questions you have!

Cheers,

Ted
01_buf_slice.scd (1.4 KB)
02_buf_slice_analysis_sort.scd (1.9 KB)
03_buf_slice_analysis_2D_plot.scd (2.6 KB)
04_buf_slice_analysis_2D_plot_kdtree.scd (6.8 KB)
05_buf_slice_analysis_2D_plot_kdtree_umap.scd (3.5 KB)

jan · March 16, 2022, 5:56am

Thank you @tedmoore, this looks like very valuable learning material! I will look into it and report back as i get along!

jan · March 16, 2022, 6:00pm

After trying out the examples, and playing with especially the fifth, a few questions:

Sticking to FluidBufMFCC as analysis, what do X and Y Axis in the plotter represent?
By what sound criteria do the clusters get organized in this case?
How could one effectively use this data re-arrangement to, for example, render clusters/groups of N nearest neighbours without a Plotter ( e.g. in NRT)?
Can one save these DataSets so not to have to repeat the analysis process (especially of larger files)?

Its really quite wonderful to delve into this approach!!

Thanks,
Jan

jan · March 18, 2022, 3:45pm

to specify the third question: could the kd tree visualization be used to navigate the corpus out of which elements/clusters are selected and given as input (recording the indices to new buffer?) to another synth?

tedmoore · March 19, 2022, 2:46pm

Hi @jan. Great questions.

When looking at a 2 dimensional space that is the output of UMAP, neither dimension, x or y, represent anything “human readable”. They aren’t any particular dimensions from the analysis. If you want the XY dimensions to refer to analysis dimensions, use these files up through file “04”.

This is a good next question. As I said above, X and Y here are not “human readable” in anyway, however what UMAP does is organizes all of the data points (sound slices) into this 2D space so that the sound analyses that are similar to each other will be near to each other in 2D space. So sound slices that have similar MFCC analyses should be near each other in this space. This is really useful because the MFCC analysis is in 13 dimensions and that’s really not possible to visualize, so UMAP helps!

You can call the method kNearest on the FluidKDTree just like is done in these files and that will give you the nearest neighbor to playback, or use otherwise. You can also ask for more neighbors than just the 1 nearest (specified when you create the create the object).

Note that this neighbor search could be done in the 2D of the UMAP output (as in the files) or even in the 13 dimensions of the MFCCs.

For clustering, check out KMeans, which again could happen in the 2D space output of UMAP or the 13 dimensions of MFCCs.

Just to be extra clear, the 2D visualization in file “05” is the output of UMAP. The KDTree is just used to look up the nearest point to the mouse in the 2D space.

I’d be curious to hear more about what you’re looking to do here. Perhaps the short answer is that the FluidPlotter has a mouseMoveAction gets passed as arguments: the plotter, mouse x, mouse y, modifier keys, button number, and click count. You could keep track of some of this info along with the nearest neighbors and put it into buffers or wherever will be useful for you!

Absolutely. Use the write method to save to disk and the read method to read from disk. This is true for all the Fluid data objects (KDTree, KMeans, Standardize, etc.).

Let me know if those answers are useful, and what other questions they may bring up!

Cheers,

Ted

jan · March 19, 2022, 5:00pm

Hi @tedmoore,

thank you for the explanations, that really clarifies a few things to me!

Regarding the specification i basically followed an intuition that would offer a flexible workflow using these tools, for example:
assuming an Umap visualisation where 8 clusters are quite distinct i’d wish to be able to select a cluster with lets say 100 points, which could be read in some sequence by an Ugen using range 1-100 in this case. another variation would be to be able to read through the clusters sequentially, so that no point of the same cluster is read before moving through the other 7.
so generally the idea would be to make accessible the data of “meaningful segments” (e.g. the clusters themselves as they prove to be “meaningful” through similarity) provided perceptually through the visualisation to the reading flexibility that is offered by Ugens (instead of mouse actions).
I hope that makes the intention more clear?

Thanks so much, also for encouraging the inquiry!

jan · March 20, 2022, 8:14am

Having looked into Kmeans it seems that it would possibly substitute the manual selecting of clusters i was aiming at for the sake of visual overview (which is quite nice to navigate really). Whats not entirely clear to me if whether the amount of clusters depend on the differentiation of timbres through analysis of the sound source?
Adapting the aforementioned example i would seek to achieve getting N clusters with X points (maybe even ordered by some criteria like centroid on both scales, within and among the clusters themselves) that could be easily accessed server side in a synth.

I find the possibilities these tools offer really incredible, but still find myself quite overwhelmed by the complexity of the analysis procedures they require…

tedmoore · March 21, 2022, 10:53am

Hello @jan!

Great questions.

Yes. These ideas are totally doable with the tools. You’re probably right to be looking at KMeans to find these clusters. You can get the output of KMeans predictions (which is written into a FluidLabelSet) and then sort by cluster that way.

The number of clusters (the “K” in KMeans) is totally determined by the user. Looking at the UMAP and thinking of what number you think you see is a good starting point for estimating how many clusters you should ask for.

One thing to keep in mind is that the KMeans algorithm doesn’t at all try to keep the number of points in each cluster equal. In fact, KMeans can end up with clusters that have zero points! If this is the case, FluidKMeans will give a warning in the post window.

Another thing to consider is whether you want to do the KMeans clustering on the output of UMAP (in the 2 dimensional space) or on the MFCC analysis itself (in 13 dimensional space). You’ll get different results, so a good approach is seeing which one is more musically meaningful to you and going with that!

Yep. This is totally doable. In this case, it seems like you might be wanting to put playback info into a buffer (like where the slice starts and how long it is) and then read out that information on the server. Once you get your clusters and such sorted in a way that is useful, I can help brainstorm how it might be played back!

Thank you for pioneering on! Your questions are very helpful for us to see what kinds of things users want to do and what kinds of things are challenging to achieve! Also, it helps us populate this Discourse with useful questions and answers and examples!

s.boot;

// 1. Load a folder of sounds
(
~folder_path = FluidFilesPath();
~loader = FluidLoadFolder(~folder_path);
~loader.play(s,{
	"loaded % soundfiles".format(~loader.index.size).postln;
})
)

// 2. mono-ize
(
~mono_buf = Buffer(s);
FluidBufCompose.processBlocking(s,~loader.buffer,numChans:1,destination:~mono_buf);
FluidBufCompose.processBlocking(s,~loader.buffer,startChan:0,numChans:1,destination:~mono_buf,destGain:1);
)

~mono_buf.plot;

// 3. Slice
(
~indices = Buffer(s);
FluidBufNoveltySlice.processBlocking(s,~mono_buf,indices:~indices,threshold:0.5,action:{
	"% slices found".format(~indices.numFrames).postln;
	"average duration in seconds: %".format(~mono_buf.duration/~indices.numFrames).postln;
});
)

// 4. Analyze
(
fork{
	var feature_buf = Buffer(s);
	var stats_buf = Buffer(s);
	var point_buf = Buffer(s);
	~ds = FluidDataSet(s);
	~indices.loadToFloatArray(action:{
		arg fa;
		fa.doAdjacentPairs{
			arg start, end, i;
			var num = end - start;

			FluidBufMFCC.processBlocking(s,~mono_buf,start,num,features:feature_buf,numCoeffs:13,startCoeff:1);
			FluidBufStats.processBlocking(s,feature_buf,stats:stats_buf);
			FluidBufFlatten.processBlocking(s,stats_buf,numFrames:1,destination:point_buf);

			~ds.addPoint("slice-%".format(i),point_buf);
			if(i % 100 == 1,{s.sync});
			"% / % done".format(i+1,~indices.numFrames-1).postln;
		};

		~ds.print;
	});
};
)

// 5. Reduce to 2 Dimensions using UMAP
(
var umap = FluidUMAP(s,2);
~ds_umap = FluidDataSet(s);

// perform umap
umap.fitTransform(~ds,~ds_umap,{"umap complete".postln});
)

// 6. plot and make sound
(
var kdtree = FluidKDTree(s);
var buf_2d = Buffer.alloc(s,2);
var scaler = FluidNormalize(s);
~ds_norm = FluidDataSet(s);

// whatever the output of umap is, scale it to be between 0 and 1 so that it will look nice in the plotter
scaler.fitTransform(~ds_umap,~ds_norm);

kdtree.fit(~ds_norm);
~ds_norm.dump({
	arg dict;
	var previous, fp;
	fp = FluidPlotter(bounds:Rect(0,0,800,800),dict:dict,mouseMoveAction:{
		arg view, x, y;
		[x,y].postln; // get the (normalized) x, y position of the mouse and...
		buf_2d.setn(0,[x,y]); // load it into a buffer so that...
		kdtree.kNearest(buf_2d,{ // it can be passed to the kdtree to find hte nearest neighbour, which is reported back...
			arg nearest; // here
			if(previous != nearest,{ // only if it is a "new" nearest neighbour, should it make sound
				var index = nearest.asString.split($-)[1].asInteger; // peel off the index of the slice
				previous = nearest;
				"nearest point is: %".format(nearest).postln;
				{
					var startPos = Index.kr(~indices,index); // look up the start position
					var dur_samps = Index.kr(~indices,index + 1) - startPos; // calculate the duration in samples

					// play the buffer starting from the start position
					var sig = PlayBuf.ar(1,~mono_buf,BufRateScale.ir(~mono_buf),startPos:startPos);
					var dur_sec = dur_samps / BufSampleRate.ir(~mono_buf);
					var env = EnvGen.kr(Env([0,1,1,0],[0.03,dur_sec-0.06,0.03]),doneAction:2);
					sig.dup * env;
				}.play;
			});
		});
	});
});
)

// 7. Cluster the UMAP output and display it with clusters in the Plotter
(
var kdtree = FluidKDTree(s);
var buf_2d = Buffer.alloc(s,2);
var scaler = FluidNormalize(s);

~labels = FluidLabelSet(s);
FluidKMeans(s,8).fitPredict(~ds_umap,~labels,{"kmeans complete".postln});

// whatever the output of umap is, scale it to be between 0 and 1 so that it will look nice in the plotter
~ds_norm = FluidDataSet(s);
scaler.fitTransform(~ds_umap,~ds_norm);

kdtree.fit(~ds_norm);
~ds_norm.dump({
	arg dict;
	var previous, fp;
	fp = FluidPlotter(bounds:Rect(0,0,800,800),dict:dict,mouseMoveAction:{
		arg view, x, y;
		[x,y].postln; // get the (normalized) x, y position of the mouse and...
		buf_2d.setn(0,[x,y]); // load it into a buffer so that...
		kdtree.kNearest(buf_2d,{ // it can be passed to the kdtree to find hte nearest neighbour, which is reported back...
			arg nearest; // here
			if(previous != nearest,{ // only if it is a "new" nearest neighbour, should it make sound
				var index = nearest.asString.split($-)[1].asInteger; // peel off the index of the slice
				previous = nearest;
				"nearest point is: %".format(nearest).postln;
				{
					var startPos = Index.kr(~indices,index); // look up the start position
					var dur_samps = Index.kr(~indices,index + 1) - startPos; // calculate the duration in samples

					// play the buffer starting from the start position
					var sig = PlayBuf.ar(1,~mono_buf,BufRateScale.ir(~mono_buf),startPos:startPos);
					var dur_sec = dur_samps / BufSampleRate.ir(~mono_buf);
					var env = EnvGen.kr(Env([0,1,1,0],[0.03,dur_sec-0.06,0.03]),doneAction:2);
					sig.dup * env;
				}.play;
			});
		});
	});

	~labels.dump({
		arg labels_dict;
		fp.categories_(labels_dict);
	});
});
)

// 8. Cluster the MFCCs output and display it with clusters in the Plotter
(
var kdtree = FluidKDTree(s);
var buf_2d = Buffer.alloc(s,2);
var scaler = FluidNormalize(s);

~labels = FluidLabelSet(s);
FluidKMeans(s,8).fitPredict(~ds,~labels,{"kmeans complete".postln});

// whatever the output of umap is, scale it to be between 0 and 1 so that it will look nice in the plotter
~ds_norm = FluidDataSet(s);
scaler.fitTransform(~ds_umap,~ds_norm);

kdtree.fit(~ds_norm);
~ds_norm.dump({
	arg dict;
	var previous, fp;
	fp = FluidPlotter(bounds:Rect(0,0,800,800),dict:dict,mouseMoveAction:{
		arg view, x, y;
		[x,y].postln; // get the (normalized) x, y position of the mouse and...
		buf_2d.setn(0,[x,y]); // load it into a buffer so that...
		kdtree.kNearest(buf_2d,{ // it can be passed to the kdtree to find hte nearest neighbour, which is reported back...
			arg nearest; // here
			if(previous != nearest,{ // only if it is a "new" nearest neighbour, should it make sound
				var index = nearest.asString.split($-)[1].asInteger; // peel off the index of the slice
				previous = nearest;
				"nearest point is: %".format(nearest).postln;
				{
					var startPos = Index.kr(~indices,index); // look up the start position
					var dur_samps = Index.kr(~indices,index + 1) - startPos; // calculate the duration in samples

					// play the buffer starting from the start position
					var sig = PlayBuf.ar(1,~mono_buf,BufRateScale.ir(~mono_buf),startPos:startPos);
					var dur_sec = dur_samps / BufSampleRate.ir(~mono_buf);
					var env = EnvGen.kr(Env([0,1,1,0],[0.03,dur_sec-0.06,0.03]),doneAction:2);
					sig.dup * env;
				}.play;
			});
		});
	});

	~labels.dump({
		arg labels_dict;
		fp.categories_(labels_dict);
	});
});
)

// 9. Take out labels information and organize in a dictionary
(
~labels.dump({
	arg dict;
	~clusters = Dictionary.new;
	dict["data"].keysValuesDo{
		arg id, cluster;
		cluster = cluster[0].asInteger;
		if(~clusters[cluster].isNil,{~clusters[cluster] = List.new});
		~clusters[cluster].add(id);
	};
	~clusters.keysValuesDo{
		arg k, v;
		"points in cluster %: %".format(k,v).postln;
	};
});
)

// 10. Do something with all slices from cluster 5 (you could also sort this  liste by some other analysis if you like):
(
~clusters[5].do{
	arg  id;
	"% is in cluster 5".format(id).postln;
}
)

// 11. Rotate through the clusters, but play each slice once before repeating in that cluster
(
var clustersPx = Dictionary.new;
~clusters.keysValuesDo{
	arg k, v;
	clustersPx[k] = Pxrand(v,inf).asStream;
};

fork{
	100.do{
		arg i;
		var clust = i % clustersPx.size;
		"a slice from cluster %: %".format(clust,clustersPx[clust].next).postln;
	};
}

)

jan · March 21, 2022, 1:28pm

Hi @tedmoore,

thanks a lot for the great example, and so good to know that these questions can in any way contribute (i often wonder whether im asking too much)!

One thing that strikes me on both the umap and mfcc plotting is that while the spatial distribution of points remains the same, the color coding is different. i’d expect that each cluster would get its own color, although that is not the case no matter how many clusters i predefine. im not yet entirely clear what (hyper?)parameters define this fine tuning of spatial clusters and color code, that ideally should match (at least in my thinking). is there a way to differentiate a color the moment custers/points have a minimal distance between them (meaning their timbre is likely to differ considerably)?
Or, maybe the color coding is unnecessary and the spatial clusters are enough (which in my example seems to be the case)?

And as you suggest, the moment i have these clusters sorted i’d love to be able to play them back server-side, the sample points being accessible somehow in a synth, (in my case preferably through buffers or arrays as these are accessible to demand rate Ugens)

Excited to be approaching the matter step by step, thanks!

PS: Here is a visual example of what i mean. here i set 16 clusters on FluidKMeans i get a quite a few warnings on empty clusters, and the spatially very divergent clusters fall under the same color!

tedmoore · March 21, 2022, 10:30pm

You are not! We appreciate it!

One thing to note is that whichever dataset you do the KMeans clustering with, the code above displays the 2D UMAP space, so even if the clusters are made in the 13 dimensional MFCC space they’re seen post-UMAP.

KMeans can sometimes not work great because it is expecting to find all rather globular (or, circular / spherical) clusters. when the data isn’t organized like this it won’t work too well. (Although the globs below look rather globular!).

Because KMeans begins from a random seeding the clusters’ order (and therefore colors) are not predictable. It is possible to seed the clusters initial positions, which may give some hints to the algorithm where you think clusters are, and hope to have some control over the order of the clusters. You may check out seeding KMeans with where you find clusters and seeing if it works better.

See example code below

I see what you mean. Can you send this dataset so I can make a demo with it. The json will work for me!
// ================================================================
Here’s some example code for seeing KMeans using FluidPlotter to visually choose where the seeds should be.

s.boot;

// make and plot a dataset with 5 globs
(
Window.closeAll;
fork{
	var counter = 0;
	var xybuf = Buffer.alloc(s,2);
	~dict = Dictionary.newFrom(["cols",2,"data",Dictionary.new]);

	1000.do{
		arg i;
		var x = [0.3,0.7,0.5,0.7,0.3][i % 5];
		var y = [0.3,0.7,0.5,0.3,0.7][i % 5];
		x = gauss(x,0.03).clip(0,1);
		y = gauss(y,0.03).clip(0,1);
		~dict["data"][i] = [x,y];
	};

	~mean_seeds = FluidDataSet(s);

	~fp = FluidPlotter(bounds:Rect(800,0,800,800),dict:~dict,mouseMoveAction:{
		arg view, x, y, modifiers, butnum, clickcount;
		if(butnum.notNil,{// don't add a point when the mouse is released
			xybuf.setn(0,[x,y]);
			~mean_seeds.addPoint(counter,xybuf,{
				~mean_seeds.print;
			});
			counter = counter + 1;
		});
	});
}
)

// click on where you think clusters are to get the positions of seeds

// run a KMeans with these seeds
(
~ds = FluidDataSet(s).load(~dict);
~labels = FluidLabelSet(s);
~mean_seeds.size({
	arg sz;
	~kmeans = FluidKMeans(s,sz).setMeans(~mean_seeds);
	~kmeans.fitPredict(~ds,~labels,{
		~labels.dump({
			arg labels;
			~fp.categories_(labels);
		});
	});
});
)

// ========================================================
some example code to get playback info into a buffer. this is modified from some example code in a post above!

s.boot;

// 1. Load a folder of sounds
(
~folder_path = FluidFilesPath();
~loader = FluidLoadFolder(~folder_path);
~loader.play(s,{
	"loaded % soundfiles".format(~loader.index.size).postln;
})
)

// 2. mono-ize
(
~mono_buf = Buffer(s);
FluidBufCompose.processBlocking(s,~loader.buffer,numChans:1,destination:~mono_buf);
FluidBufCompose.processBlocking(s,~loader.buffer,startChan:0,numChans:1,destination:~mono_buf,destGain:1);
)

// 3. Slice
(
~indices = Buffer(s);
FluidBufNoveltySlice.processBlocking(s,~mono_buf,indices:~indices,threshold:0.5,action:{
	"% slices found".format(~indices.numFrames).postln;
	"average duration in seconds: %".format(~mono_buf.duration/~indices.numFrames).postln;
});
)

// 4. Analyze
(
fork{
	var feature_buf = Buffer(s);
	var stats_buf = Buffer(s);
	var point_buf = Buffer(s);
	var play_info_buf = Buffer.alloc(s,2);

	s.sync;

	~ds = FluidDataSet(s);
	~ds_play_info = FluidDataSet(s);
	~indices.loadToFloatArray(action:{
		arg fa;
		fa.doAdjacentPairs{
			arg start, end, i;
			var num = end - start;

			FluidBufMFCC.processBlocking(s,~mono_buf,start,num,features:feature_buf,numCoeffs:13,startCoeff:1);
			FluidBufStats.processBlocking(s,feature_buf,stats:stats_buf);
			FluidBufFlatten.processBlocking(s,stats_buf,numFrames:1,destination:point_buf);

			~ds.addPoint("slice-%".format(i),point_buf);
			play_info_buf.setn(0,[start,num]);
			s.sync;
			~ds_play_info.addPoint("slice-%".format(i),play_info_buf);
			if(i % 100 == 1,{s.sync});
			"% / % done".format(i+1,~indices.numFrames-1).postln;
		};

		~ds.print;
	});
};
)

// 5. Cluster the MFCCs output
(
~labels = FluidLabelSet(s);
FluidKMeans(s,16).fitPredict(~ds,~labels,{"kmeans complete".postln});
)

// 6. Take out labels information and organize in a dictionary
(
~labels.dump({
	arg dict;
	~clusters = Dictionary.new;
	dict["data"].keysValuesDo{
		arg id, cluster;
		cluster = cluster[0].asInteger;
		if(~clusters[cluster].isNil,{~clusters[cluster] = List.new});
		~clusters[cluster].add(id);
	};
	~clusters.keysValuesDo{
		arg k, v;
		"points in cluster %: %".format(k,v).postln;
	};
});
)

// 7. Put the "play info" from all the slices of each in cluster into a buffer
(
fork{
	var tmpbuf = Buffer(s);
	~play_info_clusters = {Buffer(s)} ! ~clusters.size;
	~clusters.keysValuesDo{
		arg cluster_i, list;
		list.do{
			arg id, i;
			"cluster %\tid %".format(cluster_i,id).postln;
			~ds_play_info.getPoint(id,tmpbuf);
			s.sync;
			// write the first frame to channel 0, frame "i"
			FluidBufCompose.processBlocking(s,tmpbuf,startFrame:0,numFrames:1,destination:~play_info_clusters[cluster_i.asInteger],destStartFrame:i,destStartChan:0);

			// write the second frame to channel 1, frame "i"
			FluidBufCompose.processBlocking(s,tmpbuf,startFrame:1,numFrames:1,destination:~play_info_clusters[cluster_i.asInteger],destStartFrame:i,destStartChan:1);
		}
	};
	s.sync;
	// {~play_info_clust_5.plot}.defer;
};
)

// 8. playback according to the buffer
(
~play_all_in_cluster = {
	arg cluster_num;
	{
		var trig = Impulse.kr(1);
		var index = PulseCount.kr(trig) - 1;
		var start, num, center_seconds;

		index.poll;

		FreeSelf.kr(index > BufFrames.ir(~play_info_clusters[cluster_num]));

		# start, num = BufRd.kr(2,~play_info_clusters[cluster_num],index,0,1);
		start.poll;
		num.poll;
		center_seconds = (start + (num/2)) / BufSampleRate.ir(~mono_buf);

		TGrains.ar(1,trig,~mono_buf,1,center_seconds,num/BufSampleRate.ir(~mono_buf)).dup;
	}.play;
};
)

~play_all_in_cluster.(0);
~play_all_in_cluster.(1);
~play_all_in_cluster.(2);
~play_all_in_cluster.(3);
~play_all_in_cluster.(4);
~play_all_in_cluster.(5);
// etc.

Let me know if this what you’re looking for and how else I can help!

jan · March 22, 2022, 8:05am

Hi @tedmoore,

im looking now into the new examples, thank you! It seems at first to be exactly what im after.

Here are the json files of the DataSet before and after Umap.
(i zipped it as the json itself could be uploaded for some reason)

DataSet.zip (1.2 MB)

tedmoore · March 22, 2022, 10:51am

Hi @jan,

Thanks for sending this. As I suspected, seeding KMeans with points in the center of the clusters we see in the dataset works quite well!

jan · March 22, 2022, 11:02am

Hi @tedmoore,
it really looks like it! Got it working as well from the adapted video code.


(
Window.closeAll;
fork{
	var counter = 0;
	var xybuf = Buffer.alloc(s,2);
	~ds =FluidDataSet(s).read("/Users/jan/Downloads/DataSet/DataSetUmap-jan.json");
	
	FluidNormalize(s).fitTransform(~ds,~ds);

	~mean_seeds = FluidDataSet(s);
~ds.dump({ arg dict;
	~fp = FluidPlotter(bounds:Rect(800,0,800,800),dict:dict,mouseMoveAction:{
		arg view, x, y, modifiers, butnum, clickcount;
		if(butnum.notNil,{// don't add a point when the mouse is released
			xybuf.setn(0,[x,y]);
			~mean_seeds.addPoint(counter,xybuf,{
				~mean_seeds.print;
			});
			counter = counter + 1;
		});
	});})
}
)

(
~labels = FluidLabelSet(s);
~mean_seeds.size({
	arg sz;
	~kmeans = FluidKMeans(s,sz).setMeans(~mean_seeds);
	~kmeans.fitPredict(~ds,~labels,{
		~labels.dump({
			arg labels;
			var colors = {Color.rand} ! sz;
			labels["data"].keysValuesDo{
				arg id, cluster;
				cluster = cluster[0].asInteger;
				~fp.pointColor_(id,colors[cluster]);
			};
		});
	});
});
)

Thanks for your invaluable help so far!

jan · March 27, 2022, 3:55pm

Hi @tedmoore,
for the sake of convenience when seeding kmeans with large dataset visualisations: could one introduce a tracking option (e.g. just a colourful dot) of the places one has clicked so not to double seed clusters accidentally? otherwise its easy to loose track around the abstract visualisations…
heres the respective code you had provided:

(
Window.closeAll;
fork{
	var counter = 0;
	var xybuf = Buffer.alloc(s,2);
	//~yds =FluidDataSet(s).read("/Users/jan/Downloads/DataSet/DataSetUmap-jan.json");
	~yds=~yds_umap;
	FluidNormalize(s).fitTransform(~yds,~yds);

	~ymean_seeds = FluidDataSet(s);
	~yds.dump({ arg dict;
		~yfp = FluidPlotter(bounds:Rect(800,0,800,800),dict:dict,mouseMoveAction:{
			arg view, x, y, modifiers, butnum, clickcount;
			if(butnum.notNil,{// don't add a point when the mouse is released
				xybuf.setn(0,[x,y]);
				~ymean_seeds.addPoint(counter,xybuf,{
					~ymean_seeds.print;
				});
				counter = counter + 1;
			});
	});})
}
)

its really been a joy so far to work with this!

greetings,

jan

tedmoore · March 28, 2022, 11:28am

Hi @jan,

That’s a nice idea. Here’s some code that implements it.

This is not a feature that I think we’ll be adding to FluidPlotter, so I’ve just overlaid a UserView here and am using that to display some red dots based on the means! Let me know if this is what you’re after!

Cheers,

T

(
var counter = 0;
var xybuf = Buffer.alloc(s,2);
Window.closeAll;
~yds = FluidDataSet(s).read("DataSet/DataSetUmap-jan.json".resolveRelative);
FluidNormalize(s).fitTransform(~yds,~yds);

~ymean_seeds = FluidDataSet(s);
~yds.dump({
	arg dict;
	fork({
		var win = Window(bounds:Rect(0,50,800,800));
		var uv, means = List.new;
		~yfp = FluidPlotter(win,Rect(0,0,win.bounds.width,win.bounds.height),dict:dict,mouseMoveAction:{
			arg view, x, y, modifiers, butnum, clickcount;
			if(butnum.notNil,{// don't add a point when the mouse is released
				means.add([x,y]);
				xybuf.setn(0,[x,y]);
				~ymean_seeds.addPoint(counter,xybuf,{
					~ymean_seeds.print;
					{win.refresh}.defer;
				});
				counter = counter + 1;
			});
		});
		uv = UserView(win,Rect(0,0,win.bounds.width,win.bounds.height))
		.acceptsMouse_(false)
		.drawFunc_{
			means.do{
				arg xy;
				var width = 12;
				var half_width = width / 2;
				var x = xy[0].linlin(0,1,0,win.bounds.width);
				var y = xy[1].linlin(0,1,win.bounds.height,0);
				Pen.circle(Rect(x - half_width,y - half_width,width,width));
				Pen.color_(Color.red);
				Pen.fill;
			}
		};
		win.front;
	},AppClock);
});
)

jan · March 28, 2022, 6:59pm

Hi @tedmoore,

im getting the following error when applying the code:
(seems like my Pen class has no circle method?)

DataSet 53177:
rows: 4 cols: 2
0 0.023929 0.49748
1 0.55164 0.90428
2 0.040302 0.43325
3 0.54156 0.062972

ERROR: Message ‘circle’ not understood.
Perhaps you misspelled ‘idle’, or meant to call ‘circle’ on another receiver?
RECEIVER:
class Pen (0x7fa5a81c5f80) {
instance variables [19]
name : Symbol ‘Pen’
nextclass : instance of Meta_Penv (0x7fa5e01efa40, size=19, set=5)
superclass : Symbol ‘Object’
subclasses : instance of Array (0x7fa5c89e0b40, size=1, set=2)
methods : nil
instVarNames : nil
classVarNames : instance of SymbolArray (0x7fa5a81c6100, size=1, set=2)
iprototype : nil
cprototype : instance of Array (0x7fa5a81c61c0, size=1, set=2)
constNames : nil
constValues : nil
instanceFormat : Integer 0
instanceFlags : Integer 0
classIndex : Integer 294
classFlags : Integer 0
maxSubclassIndex : Integer 295
filenameSymbol : Symbol ‘/Applications/SuperCollider.app/Contents/Resources/SCClassLibrary/Common/GUI/Base/QPen.sc’
charPos : Integer 0
classVarIndex : Integer 333
}
ARGS:
Instance of Rect { (0x7fa5cd214ef8, gc=10, fmt=00, flg=00, set=02)
instance variables [4]
left : Float 13.143577 E6D9F464 402A4982
top : Float 396.015113 E77F0862 4078C03D
width : Integer 12
height : Integer 12
}
CALL STACK:
DoesNotUnderstandError:reportError
arg this =
Nil:handleError
arg this = nil
arg error =
Thread:handleError
arg this =
arg error =
Object:throw
arg this =
Object:doesNotUnderstand
arg this =
arg selector = ‘circle’
arg args = [*1]
< FunctionDef in closed FunctionDef >
arg xy = [*2]
var width = 12
var half_width = 6.0
var x = 19.143576826196
var y = 402.01511335013
ArrayedCollection:do
arg this = [*4]
arg function =
var i = 0
List:do
arg this =
arg function =
UserView:doDrawFunc
arg this =
^^ ERROR: Message ‘circle’ not understood.
Perhaps you misspelled ‘idle’, or meant to call ‘circle’ on another receiver?
RECEIVER: Pen

tedmoore · March 28, 2022, 7:11pm

Oops! Try replacing circle with addOval. I guess circle is an extension I must have installed.

jan · March 28, 2022, 7:27pm

working perfectly now!
very handy feature, thank you @tedmoore!

jan · March 29, 2022, 8:31am

Hi @tedmoore,

a short question about the example below regarding the variable “num”: why is it used for the grain position (instead of only start)? Is the purpose to center the grain at the middle of the slice instead of the beginning (as the variable represents the number of frames for a slice)?
Im asking because im aiming to use it with PlayBuf instead!
Thanks!

center_seconds = (start + (num/2)) / BufSampleRate.ir(~mono_buf);
TGrains.ar(1,trig,~mono_buf,1,center_seconds,num/BufSampleRate.ir(~mono_buf)).dup;

taken from here:

// 7. Put the "play info" from all the slices of each in cluster into a buffer
(
fork{
	var tmpbuf = Buffer(s);
	~play_info_clusters = {Buffer(s)} ! ~clusters.size;
	~clusters.keysValuesDo{
		arg cluster_i, list;
		list.do{
			arg id, i;
			"cluster %\tid %".format(cluster_i,id).postln;
			~ds_play_info.getPoint(id,tmpbuf);
			s.sync;
			// write the first frame to channel 0, frame "i"
			FluidBufCompose.processBlocking(s,tmpbuf,startFrame:0,numFrames:1,destination:~play_info_clusters[cluster_i.asInteger],destStartFrame:i,destStartChan:0);

			// write the second frame to channel 1, frame "i"
			FluidBufCompose.processBlocking(s,tmpbuf,startFrame:1,numFrames:1,destination:~play_info_clusters[cluster_i.asInteger],destStartFrame:i,destStartChan:1);
		}
	};
	s.sync;
	// {~play_info_clust_5.plot}.defer;
};
)

// 8. playback according to the buffer
(
~play_all_in_cluster = {
	arg cluster_num;
	{
		var trig = Impulse.kr(1);
		var index = PulseCount.kr(trig) - 1;
		var start, num, center_seconds;

		index.poll;

		FreeSelf.kr(index > BufFrames.ir(~play_info_clusters[cluster_num]));

		# start, num = BufRd.kr(2,~play_info_clusters[cluster_num],index,0,1);
		start.poll;
		num.poll;
		center_seconds = (start + (num/2)) / BufSampleRate.ir(~mono_buf);

		TGrains.ar(1,trig,~mono_buf,1,center_seconds,num/BufSampleRate.ir(~mono_buf)).dup;
	}.play;