SC: FluidKMeans weirdness

Greetings,

My SuperCollider FluidKMeans object is not behaving as expected.

I’ve made some dummy data in 3 dimensions with 30 points gaussian clustered around [0,0,0], 30 points gaussian clustered around [0,0,1], and 40 points gaussian clustered around [1,1,1].

However, when I run .fit with k=3, it tells me “Warning: empty cluster”, which I really think shouldn’t happen since I have such clear clusters.

When I make a FluidLabelSet and run FluidKMeans .getClusters, it reports back that all the points are part of the same cluster. Again, that doesn’t make sense to me.

Is this a bug? Am I thinking about my data incorrectly? Am I using it in correctly?

Lastly, there’s a block of code in there where I try to run through all the points in the FluidDataSet and print them to make sure they look right… there’s one block commented out that I really feel like should work fine but seems to not. In the working one, I had to package this into a Task and use an s.sync in order for the buffers to behave properly… but again, seems like the first one should work? Is this a SuperCollider thing I’m missing or is it a Fluid thing? Or a Ted thing?

(code below)

Thank you thank you!

s.boot;

Buffer.freeAll;
// choose number of dimensions
~n_dims = 3;
// make sure the dataset is clear... in case i'm testing it a few times in a row...
~dataset.clear;
// make empty dataset
~dataset = FluidDataSet(s,\kmeans_test,~n_dims);

(
// fill 'er up
100.do({
	arg i;
	var x,y,z;
	var dev = 0.1;

	if(i < 30,{
		// 30 points clustered around [0,0,0]
		x = 0.0;
		y = 0.0;
		z = 0.0;
	},{
		if(i < 60,{
			// 30 points clustered around [0,0,1]
			x = 0.0;
			y = 0.0;
			z = 1.0;
		},{
			// 40 points clustered around [1,1,1]
			x = 1.0;
			y = 1.0;
			z = 1.0;
		});
	});

	x = gauss(x,dev);
	y = gauss(y,dev);
	z = gauss(z,dev);

	[x,y,z].postln;

	Buffer.loadCollection(s,[x,y,z],1,{
		arg buf;
		~dataset.addPoint(i.asSymbol,buf);
	});
});
)

/*(
// look through all the point to make sure they look right...
100.do({
arg i;
Buffer.alloc(s,~n_dims,1,{
arg buf;
~dataset.getPoint(i.asSymbol,buf,{
buf.getn(0,3,{
arg points;
points.postln;
"%:\t%\t%\t%".format(i,points[0],points[1],points[2]).postln;
});
});
});
});
)*/

(
// look through all the points to make sure they look right...

// for some reason the commented out version above doesnt work for me...
// but i feel like it should.

Task({
	100.do({
		arg i;
		var buf = Buffer.alloc(s,~n_dims);
		s.sync;
		~dataset.getPoint(i.asSymbol,buf,{
			buf.getn(0,3,{
				arg points;
				//points.postln;
				"%:\t\t%\t\t%\t\t%\t\t%".format(i,points[0].round(1),points[1].round(1),points[2].round(1),points).postln;
			});
		});
	});
}).play;
)

// make an empty buffer to fill with a specific data point from dataset
b = Buffer.alloc(s,~n_dims);

(
// put data point into buffer b
// this one should be shaped like this: __/
~dataset.getPoint(\50,b,{
	"done".postln;
	defer{b.plot};
});
)

~kmeans = FluidKMeans.new;

(
~kmeans.fit(~dataset,3,1000,action:{
	"done"
});
// returns Warning: empty cluster, but i feel like it shouldn't since there are cleary 3 clusters...?
)

// need to make a fluid labelset in order to use "get clusters"
~labelset = FluidLabelSet(s,\test_labelset);

(
~kmeans.getClusters(~dataset,~labelset,{
	"done".postln;
});
)

(
100.do({
	arg i;
	~labelset.getLabel(i.asString,{
		arg label;
		"%: %".format(i,label).postln;
	});
});
// it say's they're all part of the same clusters...?
)

indeed. you forgot to give the server:

~kmeans = FluidKMeans.new(s);

But it does not seem to be doing the right number of clusters indeed. I'll need to investigate later today since the examples I provide (super simple examples) do the same thing and work...
1 Like

Ah! you forgot to ‘predict’ the labels:
~kmeans.predict(~dataset,~labelset, {|x| ("Size of each cluster" + x).postln})
after defining ~labelset and you’re golden!

The 1000 iterations is a little OTT, and does return at times ‘empty cluster’ message, which is strange. I’m sure @groma will know why though

1 Like

Kmeans has a stopping condition. Eventually no points change cluster and the algorithm terminates. Typically how long it takes will depend on the number of points and how clear are the clusters. We provide the number only as a maixmum. So yes, I imagine 1000 here is never reached.

1 Like