SC: FluidNormalize and FluidStandardize not working?

Hello,

I believe these two are not working in SuperCollider. When I try to use ‘.fit’ or ‘.normalize’ I get an error: “ERROR: DataSet doesn’t exist”, but I’m pretty sure it does. See below.

Thanks! Looking forward to using this!

Ted

(
// set some variables
~vec_size = 10;
~dataset.clear;
~dataset = FluidDataSet(s,\test,~vec_size);
)

(
// fill up the dataset
20.do({
	arg i;
	Buffer.loadCollection(s,Array.fill(~vec_size,{rrand(0.0,100.0)}),action:{
		arg buf;
		~dataset.addPoint(i.asInteger.asString,buf);
		buf.free;
	});
});
)

// make a buf for getting points back
~retrieval_buf = Buffer.alloc(s,~vec_size);

(
// look at a point to see that it has points in it
~dataset.getPoint("0",~retrieval_buf,{
	~retrieval_buf.getn(0,~vec_size,{
		arg vec;
		vec.postln;
	});
});
)

(
// look at another point to make sure it's different...
~dataset.getPoint("15",~retrieval_buf,{
	~retrieval_buf.getn(0,~vec_size,{
		arg vec;
		vec.postln;
	});
});
)

// make a FluidNormalize
~normalize = FluidNormalize(s,0,1);

// fit throws an error
~normalize.fit(~dataset,{"done".postln;}); // throws error

// making an empty 'normed_dataset' which is required for the normalize function
~normed_dataset = FluidDataSet(s,\normed,~vec_size);

// normalize throws an error
~normalize.normalize(~dataset,~normed_dataset,{"done".postln;}); // throws error

// try FluidStandardize
~standardize = FluidStandardize(s);

// error
~standardize.fit(~dataset,{"done".postln;});

// try '.standardize'
~standardized_dataset = FluidDataSet(s,\standardized,~vec_size);

// error
~standardize.standardize(~dataset,~standardized_dataset,{"done".postln;});

[edited because I was wrong] OK the documentation is misleading I think (@weefuzzy and I will make it clearer) but we need to normalize (and standardize) to another dataset first. I think that the order of the doc, and the nomenclature, are not clear. (using in-place for the fit method for instance)

order of things to do:

  1. make a process instance (you do)

  2. fit the data to compute the coefficients

  3. normalize a dataset to another dataset (since one does not want to overwrite the original data) or to a point (normalizePoint)

@weefuzzy and I are meeting tomorrow to talk about this amongst other things but feel free to brainstorm with us now an interface that makes more sense to you. We have a reason to propose this one, but if you don’t mind I’d like to use your intuition and experience as a new proposal against which I can test our rationals. Does it make sense?

btw, there are also bugs in the SC implemetation since that even when doing the right order of things, I get errors. Maybe @weefuzzy will see the error more clearly. I’m doing a Max version now that works (if you want to check)

@weefuzzy is the line

~normalize.normalize(~dataset,~normed_dataset,{"done".postln;});

right or should it be

~normalize.normalize(\test,\normed,{"done".postln;});

I get a different error in either case…

Either should work, but I think I might have neglected to implement FluidDataSet.asUGenInput, which FluidNormalize seems to think should be a thing.

1 Like

ok so @tedmoore’s code should work if I use the server-side-name? I’ll give it a spin.

For now, use asString explicitly:

~normalize.fit(~dataset.asString,{"done".postln;}); 
~normalize.normalize(~dataset.asString,~normed_dataset.asString,{"done".postln;}); 
1 Like

This is the Max version of the same patch. I’ll try your hack of asStringand post the code for pleasure for now.

<pre><code>
----------begin_max5_patcher----------
3082.3oc4bs0iipiD94t+UXEoUqzpL4fuC6S69eXeaznVNINo4LDHKP5KyQm
4295aPHoAhSZfdN81OjPaiwt9bUkq5yl7G2e2rkYuHKlA9mfuBt6t+396tyT
jtf6b++cy1IdYUhnvbayROraoLe1baUEkulHMkWUR1gxDYY4q6k1mppBvrkh
zsy.eycKpGQbp5lLOPnqv8hxUOFmt8gb4pRaSYP5hf4.JLz7Uf9SDZQP8CJd
soqyV96eIJZV8CJWrSVJyePlJVZGcAG6Y63yz0Hcg+482q+Xtuhu7YU2U0Wk
xWLi0YKOrYiL+mfhx0pKA+qBwt8E.XvUfSaRxDk2NXQifKnyA3.hFlfTTunE
e1TgHoY469XgDTnQyAxv8BIrgCRVksamLs7MXxl3RPYFXqT80iRvpL4lMEy0
WmpzbDoqE4qi+gTeOhzLUw4f0hRQgr9QkDmJWkcH077Pchj9hPPXfATnXnAa
fluv31gnvf1gnfADh9uGj4uN68KXAQFQgQ35uHVwKnC4BNbx0NYQgXq7MxUg
Zd8ZbZ1txNpaINhZjTFyX+i607OD0t.CGPAdUhTjOtRLwHjLrQvse1oDiGNI
tKUWX.vt3XgxhVTBD4xlV0yAO+X7pGA6jhTyMLGrIKGHEpxVkkbXW5usNV8f
KhyTWIKVkGuuLK23d.HdRlqPYP1FfHIwTjZHj+J34X0+tTBBVnbYr1TQUeBV
KeJVTpdb.3hVcfPe+1YDHS6iMT8kF9sKRyIcLKPFeCM0..QQDlBQBHHHBB9h
tHNjYJhRXp5zEoVOfYuJHL.4JiDwo1xPgXhtAPLhXJggfT3U5Gd.zxCUFzZ.
1tfN05OCS6.foCmZdGqq+iDv17rC6utUx0Khe8xd0xSNsKduQ.FxF8XZ1Kke
uJFuqNLl1kd7kkddjYpOreomO5y7G9QrZRGbMS6l31le763zxaPKHfEYvAlM
OfdCiKLrcb.Ob3fR6uT4kUGnlJRt8YwGWKvWKgtsF5SeH.QrdWMQynBwoWfH
Z3.ht70hBF0E3gJmuZeetIdTjIZ9theOXzM.J1mqlq2TOqCLe9k+15q2kv0m
LiZ12jMCiaRJFxY8lMCb.QiSx3WUU7tC6LOrnpxhSqJKXJHFnFKrqFdAlAPe
nLCrI4P75Etz39InTVT9Ek5x0s7Yk+y5DhudLKzle.2RlBjzO+.3Q2KpEWZD
e7OmV7HBZy7kgrNSo8hGjw2Ypgb.k1w7SoBPURsRy3lNErIU.Pdu4SEQG+7o
5isDMqRhj3BoopMGTIA4Lw7k7D76O2mvPsSnJMnPiFj1iTa.FO5uLbmfngdS
cBO7S.0IPabUdQcBm8Yf5DWridQcBmOoTmrTV9rTYgqU3zLZnT+.OqLlkVlR
1mUDanzHt.rRlWJhS0V7OJdRBDk.Exo7WlkJAwkxcpKbjtnbknZf8Ipq8rJZ
Oudx61ThFhaxNhkadVG1RrnofcDVDhinJ3kSPAgP0EvPLBq9NDGPnb0EXr5N
hzWPIDaADNWeKjPLkqqgQUt8zMdTUSwv2R8QmpoAepn9.YI6vKlO3vIh4C2d
4L9Te3jdGyGVp76T5QeZY9.gYdS7AG+4k3CqBfe7dvI+0m2iv.uo8fS+.o83
58GbC45aixth1ivdWDfw+TS6gCJ7g0CV3ubrdn0VlbZObTl5X8neWnL1v453
VhzMPc0.j9VjMMFlYsCXTTe4uwHSTZo.AXi7Ycn5qdTt561cHUkj9qlsIccr
9vhnZ7wsD0r8luez.GXyNmZPCLjzKZfGezXm36R0.vHewxB8N5pzIdRjbPVz
Xae+sy2RXCpr.7edTBREJYeqtkZxOzI572K.IhkxDctLh7kwk4BMjW.NTbPj
79QwHabHNdAX8hgSvgo.Et.AXKn5MZloOtJL.MZAGvwKHpXGWf0YNvUAOq9.
hWDNxqVaM2rti4j97vPi9bkjBIvddgfWNIEJehRRw33XZxRwI9Al0ii5W7Ye
dyRIv5PHDe4rTnzOuYovrooZWo4BYoPXiNNzcX62fExsD3k4fAACIdD2NJb7
YUc7oQF514D7koQlP9TbB7rp51.B5mEYBb7E3wNsbZf83DiubZ43weg9VRwZ
xytxtSQHqtND0qJOZ.838KWp4Ud6rN+6O0bL+CM078.cJHOXVMnXTWBvpc.s
65QXuqGh6hFqlOc0TtUZch6cy1DmHeRkKsJ+nF28cyD622n36ZzDMF86YlGD
edcQwo1hv0EkKeJtp8z5RE4JYrTIfGxsn0K0IPqeLYqk4oGhqMzMyV2W8.qg
EpEJbGXaaPjjvF4SolY2ljs56x0MzHTSI6kowo6ykEpUvMGr2Spdsbi3PR4C
axTSr5E6zCCsIYK0uQrR1YiU4WZkt+cdbc1i2Maad75rT8f3DrVWbU2ocRZS
2tovXtiTw9VZrZtWgKcTYgRHOTrTjqmJblGnpJKyxRNsp51kH2T5pdebZ5Yn
XY19tqLOd6i8z1kYpJ202y1TSwCGRs09fxZr7gBwSmh1khjDm44oO9WDJ+Vh
RYYrcJ.ETWo0EwiEqxyRRNQds07TK0nOA3qjOGut7QSG0TYvxzfSIZV8r753
spURNsrRw1hSK4M9KTEcXoyJ8gR4t8IJo3za3jW0rlljMcjcR4m4Py5Wn1dq
0gQW9m57XQ41yeaR7AUeVq7dleJ3Y8SSBNZTSE4OmOxhSWKeog6AmOcmmhaD
VNwOe64.HVut+b.5AM6dgfaErCrwI+1Dl6LsfthjZ7QvRPLX4M.RlrNOKZha
DufH9QkydgqdzNQ+e.bAsPDjG4GdA4e33UqTVcM.2oTW0IgEW.xb629ksHgr
owh7T1yOOMOCm2Ep3XWKc7mafgi6nR09ozr8s8J6LPKhfCr4AYOAkP69117s
n7MvHom0QlFMu+g9j+LBJbnKnvEYydFGV8dL0uB2G+J.4JcoL8FWF.uQGaua
vxxkXH7RfE+CGrNiC5qAo5iK5qF4bLSQgsSI8a1zptAN7zAbneA.NK0Elsh1
GfiNM.WeKGnT2Vkkb1Y8dXcr6RO.530om7ClH+5FA3VUU5V2HvGcCRfeoJg6
AKfu2bkNRtfdY8NxpzHB55aGQKxNjupBYbq0.NUbTYIWFmVS0xWOFDD.B9lW
SnW6ff64fvMXGkwfV9P9.D7QbPP8bPny5YzFDZkXulNni3f.4sh4XNHfVk9K
NHXi8fvGj.MllGF0defB9XOH71aU6iBWgUjTOSyS45Grbx9fnrLOd4gRq20l
rteUbGtMIaoHwwLXMyx8Qj3Q1Fu+338J26C03tTM5txMC6F2gC2Y9o9yV28q
w+b5V8KtTEaBej+jK4NDDrd2QPHc1zbNTcusfS96Xp8mrA2KHXTuugovo7EL
sFOZ75kpKaT29bGXT85kF16Iw0CqES6NM7OK5btmTGF8FOn5ex3N5AsCum52
d7F2TV9Z6tP19wYbP6ZVX6cM7800TejZxw0Lt8dB2LBttBiMZH5IjOxDbH5I
nO8Dd.5oPuzNCNlIx6qmfWZdJbHjoPezH3CQOw8.8BYCQOw7omnCw7Ddxz8P
SUOw8YdhMDySb1j0SDO5I9PnQX5oKY4xGj4Ie7kyQCQO4itGeH7ky8wWNeH7
vd1J4cn6MD98XSl8Di4iFwPDGAymHVFBEBpOfGMZ.THn93LhNDJDTpGSSzgP
gfv7vYDkNT8zkPO7PLOQ7wU9P30i.mnNB6iNNZHB1CSmHQBENUcjORDdHzvQ
dXzRdqQqMk3yN7q593rC85YG302dXW69ftd9gb0vcnkvfyRCulquchWXZFKL
RS0AM8T.PrZkLsbUVhcT8U8uFIyM+jjDgfL8U7fnv.xbyOGLeqAyFl17PbpV
hkUsMXdiOZ1hkaazGPWstOBVPq.y4tIK8aj54bgLamZzGWQAPyI2SD3F6PW6
h7IBKAgonH8XP+SuBwbEhSnP2vevFVmvMT6Crka2DmjTO7ZR7aE+Oy1lKVG2
XCYsRCrdtio9iOu0qfMOYvllgpZFOLTgCya8p21rpVghBfQFcDrSGQekpH3Y
sRjt0cFh4MNHr6yy1mkWeXpWfipu+CkY0BZ8F3VSV9Un802z2oVG9MK17fv5
g1kYrPTpUAHyUJTRIk2plkpM+48+O5Ms+k.
-----------end_max5_patcher-----------
</code></pre>

The working code with the temporary fix to both so you can get going. The next step is to normalise/standardise a querying point.

I’m continuing to develop this example in Max and SC in parallel as part of my super-simple-X-example series, would that help?

p

(
// set some variables
~vec_size = 10;
~dataset.clear;
~dataset = FluidDataSet(s,\test,~vec_size);
)

(
// fill up the dataset
20.do({
	arg i;
	Buffer.loadCollection(s,Array.fill(~vec_size,{rrand(0.0,100.0)}),action:{
		arg buf;
		~dataset.addPoint(i.asInteger.asString,buf);
		buf.free;
	});
});
)

// make a buf for getting points back
~retrieval_buf = Buffer.alloc(s,~vec_size);

(
// look at a point to see that it has points in it
~dataset.getPoint("0",~retrieval_buf,{
	~retrieval_buf.getn(0,~vec_size,{
		arg vec;
		vec.postln;
	});
});
)

(
// look at another point to make sure it's different...
~dataset.getPoint("15",~retrieval_buf,{
	~retrieval_buf.getn(0,~vec_size,{
		arg vec;
		vec.postln;
	});
});
)

// make a FluidNormalize
~normalize = FluidNormalize(s,0,1);

// fits the dataset to find the coefficients
~normalize.fit(~dataset.asString,{"done".postln;}); // throws error

// making an empty 'normed_dataset' which is required for the normalize function
~normed_dataset = FluidDataSet(s,\normed,~vec_size);

// normalize the full dataset
~normalize.normalize(~dataset.asString,~normed_dataset.asString,{"done".postln;}); // throws error

//
(
// look at a point to see that it has points in it
~normed_dataset.getPoint("0",~retrieval_buf,{
	~retrieval_buf.getn(0,~vec_size,{
		arg vec;
		vec.postln;
	});
});
)

// try FluidStandardize
~standardize = FluidStandardize(s);

// error
~standardize.fit(~dataset.asString,{"done".postln;});

// try '.standardize'
~standardized_dataset = FluidDataSet(s,\standardized,~vec_size);

// error
~standardize.standardize(~dataset.asString,~standardized_dataset.asString,{"done".postln;});

(
// look at a point to see that it has points in it
~standardized_dataset.getPoint("0",~retrieval_buf,{
	~retrieval_buf.getn(0,~vec_size,{
		arg vec;
		vec.postln;
	});
});
)

here we go. This will be included in the next alpha, but for it to work now you need to correct the class definition of FluidNormalize and FluidStandardize (replace dataset.asUGenInput with dataset.asString 3 times each and recompile - I can also send the amended ones)

I’ve made it a tutorial, explaining everything and expected results. Let me know if it is clearer.

pa

(
// set some variables
~nb_of_dim = 10;
~dataset.clear;
~dataset = FluidDataSet(s,\test,~nb_of_dim);
)

(
// fill up the dataset with 20 entries of 10 column/dimension/descriptor value each. The naming of the item's label is arbitrary as usual
20.do({
	arg i;
	Buffer.loadCollection(s,Array.fill(~nb_of_dim,{rrand(0.0,100.0)}),action:{
		arg buf;
		~dataset.addPoint("point-"++i.asInteger.asString,buf);
		buf.free;
	});
});
)

// make a buf for getting points back
~query_buf = Buffer.alloc(s,~nb_of_dim);

// look at a point to see that it has points in it
~dataset.getPoint("point-0",~query_buf,{~query_buf.getn(0,~nb_of_dim,{|x|x.postln;});});

// look at another point to make sure it's different...
~dataset.getPoint("point-7",~query_buf,{~query_buf.getn(0,~nb_of_dim,{|x|x.postln;});});

///////////////////////////////////////////////////////
// exploring full dataset normalization and standardization

// make a FluidNormalize
~normalize = FluidNormalize(s,0,1);

// fits the dataset to find the coefficients
~normalize.fit(~dataset,{"done".postln;}); // throws error

// making an empty 'normed_dataset' which is required for the normalize function
~normed_dataset = FluidDataSet(s,\normed,~nb_of_dim);

// normalize the full dataset
~normalize.normalize(~dataset,~normed_dataset,{"done".postln;}); // throws error

// look at a point to see that it has points in it
~normed_dataset.getPoint("point-0",~query_buf,{~query_buf.getn(0,~nb_of_dim,{|x|x.postln;});});
// 10 numbers between 0.0 and 1.0 where each column/dimension/descriptor is certain to have at least one item on which it is 0 and one on which it is 1
// query a few more for fun

// try FluidStandardize
~standardize = FluidStandardize(s);

// fits the dataset to find the coefficients
~standardize.fit(~dataset,{"done".postln;});

// standardize the full dataset
~standardized_dataset = FluidDataSet(s,\standardized,~nb_of_dim);
~standardize.standardize(~dataset,~standardized_dataset,{"done".postln;});

// look at a point to see that it has points in it
~standardized_dataset.getPoint("point-0",~query_buf,{~query_buf.getn(0,~nb_of_dim,{|x|x.postln;});});
// 10 numbers that are standardize, which mean that, for each column/dimension/descriptor, the average of all the points will be 0. and the standard deviation 1.

/////////////////////////////////////////////////////
// exploring point querying  conceepts via norm and std

// Once a dataset is normalized / standardized, query points have to be scaled accordingly to be used in distance measurement. In our instance, values were originally between 0 and 100, and now they will be between 0 and 1 (norm), or their average will be 0. (std). If we have data that we want to match from a similar ranging input, which is usually the case, we will need to normalize the searching point in each dimension using the same coefficients.

// first, make sure you have run all the code above, since we will query these datasets

// get a know point as a query point
~dataset.getPoint("point-7",~query_buf);

// find the 2 points with the shortest distances in the dataset
~tree = FluidKDTree.new(s);
~tree.fit(~dataset)
~tree.kNearest(~query_buf,2, {|x| ("Labels:" + x).postln});
~tree.kNearestDist(~query_buf,2, {|x| ("Distances:" + x).postln});
// its nearest neighbourg is itself: it should be itself and the distance should be 0. The second point is depending on your input dataset.

// normalise that point (~query_buf) to be at the right scale
~normbuf = Buffer.alloc(s,~nb_of_dim);
~normalize.normalizePoint(~query_buf,~normbuf);
~normbuf.getn(0,~nb_of_dim,{arg vec;vec.postln;});

// make a tree of the normalized database and query with the normalize buffer
~normtree = FluidKDTree.new(s);
~normtree.fit(~normed_dataset)
~normtree.kNearest(~normbuf,2, {|x| ("Labels:" + x).postln});
~normtree.kNearestDist(~normbuf,2, {|x| ("Distances:" + x).postln});
// its nearest neighbourg is still itself as it should be, but the 2nd neighbourg will have changed. The distance is now different too

// standardize that same point (~query_buf) to be at the right scale
~stdbuf = Buffer.alloc(s,~nb_of_dim);
~standardize.standardizePoint(~query_buf,~stdbuf);
~stdbuf.getn(0,~nb_of_dim,{arg vec;vec.postln;});

// make a tree of the standardized database and query with the normalize buffer
~stdtree = FluidKDTree.new(s);
~stdtree.fit(~standardized_dataset)
~stdtree.kNearest(~stdbuf,2, {|x| ("Labels:" + x).postln});
~stdtree.kNearestDist(~stdbuf,2, {|x| ("Distances:" + x).postln});
// its nearest neighbourg is still itself as it should be, but the 2nd neighbourg will have changed yet again. The distance is also different too

// where it starts to be interesting is when we query points that are not in our original dataset

// fill with known values (50.0 for each of the 10 column/dimension/descriptor, aka the theoretical middle point of the multidimension space) This could be anything but it is fun to aim in the middle.
~query_buf.fill(0,~nb_of_dim,50);

// normalize and standardize the query buffer. Note that we do not need to fit since we have not added a point to our reference dataset
~normalize.normalizePoint(~query_buf,~normbuf);
~standardize.standardizePoint(~query_buf,~stdbuf);

//query the single nearest neighbourg via 3 different data scaling. Depending on the random source at the begining, you will get small to large differences between the 3 answers!
~tree.kNearest(~query_buf,1, {|x| ("Original:" + x).post;~tree.kNearestDist(~query_buf,1, {|x| (" with a distance of " + x).postln});});
~normtree.kNearest(~normbuf,1, {|x| ("Normalized:" + x).post;~normtree.kNearestDist(~normbuf,1, {|x| (" with a distance of " + x).postln});});
~stdtree.kNearest(~stdbuf,1, {|x| ("Standardized:" + x).post; ~stdtree.kNearestDist(~stdbuf,1, {|x| (" with a distance of " + x).postln});});

This is great! Thank you. I can change the class files and recompile.

There are still a few "// throws error"s in there–but those lines no longer throw errors.

Also, I think “, but the 2nd neighbourg will have changed.”, should probably say:
“, but the 2nd neighbourg will probably have changed.” A few times when I ran it all, the neighbor didn’t change! Of course the distance did though.

Regarding the question of passing the object or the “name” as a symbol to the FluidNormalize and FluidStandardize, what makes the most sense to me is what I had tried initially, passing the object. SuperCollider does sometimes use passing a name, most commonly in Synth(), in which one passes the “name” of the synth, which is defined in the SynthDef()–the def being the key signifier here that that is what’s going on. Similarly, see OSCFunc vs. OSCdef (yes, it’s a lowercase “d”…). So unless the classes I’m using have a “def” signifier, letting me know that that’s what going on, I wouldn’t think to pass a name. I hope that’s helpful!

1 Like

this is now possible with the mods I’ve done to the class and suggested to you in the previous one and it does feel a lot more natural indeed :wink:

thanks for your help on this - both the recommendation and the error finding were also useful!

p

pa

1 Like

Your explanation does make sense. I do think that it would be confusing if it’s not clear in the docs.

Calling “normalize” seems like it would first “fit” to the sourceDataset first and then write the normalized values into the destDataset. I understand now that I need to manually ‘fit’ first. I think this is fine, as long as the docs are clear and there’s some comments in the examples that say this as well.

If it’s helpful, the methods that I really like are the python sklearn ones. There’s a “fit” which, um, yes, fits. Then instead of “normalize” the method is “transform”, which takes in an array of vectors and provides the normalizations based on the previous “fit”. ––– The difference that I like is that there is also a method, “fit_transform,” which takes in an array of vectors, finds the coefficients, then returns the normalized values (and remembers the coefficients). The way these three methods are presented together makes clear their different functions.

Would it make sense to give FluidNormalize a “fit_normalize” method?

Thanks for your speedy replies. I have something I’ll post in code sharing soon!

Yes, passing the object instance was / is actually the intended design. My goof was that I’d had to switch away from internally using asUgenInput to send the instance name to the server (because reasons) and use asString instead, but had failed to update Normalize and Standardize.

I like the sklearn design too, and would be quite cheerful if these lower-level objects reflected it, especially if there were some uniformity across classes (thus allowing easier generic experimentation between different techniques). This stuff is all still up for grabs, so do keep the feedback coming – thanks!

Yeah for sure. In this case, having FluidNormalize and FluidStandardize both use “.transform” would be nice because then if I want to try standardizing instead of normalizing, it could be just one line:

scaler = FluidNormalize(s);
// scaler = FluidStandardize(s);

scaler.fit(mydata,{
    scaler.transform(mydata,mynormeddata,{
        // so something
    });
});

if I want to try one or the other, I can just change which is commented, everything else runs smoothly.

2 Likes