Weird behaviour when loading a large(ish) dataset from file (supercollider)

Jordan · August 6, 2022, 8:41am

So I’m trying to cache a dataset to disk as it doesn’t change and I often need to restart the SuperCollider server&interpreter.

Not too sure what the correct approach is, I couldn’t find any documentation,
but essentially I using the toBuffer method, making a buffer and a label set, writing both to file then reloading them. I’ve tried different file formats, doesn’t seem to make a difference.

This works for small data sets but not for large and I do not know why…
first a small example that shows a working small dataset.

— you will need to set ~pwd to a directory —

FluidDataSet.version = Fluid Corpus Manipulation Toolkit: version 1.0.0+sha.3fd541f.core.sha.f86443a
Manjaro linux - all built from source.


~initalDataSet = FluidDataSet(s);
b = Buffer.loadCollection(s, [1,0.23], 1)
~initalDataSet.addPoint("first", b)
b.setn(0, [-0.3, 4.4])
~initalDataSet.addPoint("second", b)


~initalDataSet.dump{|d| d["data"].keysValuesDo{|k,v| [k,v].postln}}
//[ second, [ -0.30000001192093, 4.4000000953674 ] ]
//[ first, [ 1.0, 0.23000000417233 ] ]

~toBufB = Buffer(s)
~toBufL = FluidLabelSet(s)
~initalDataSet.toBuffer(~toBufB, 0, ~toBufL)

~fromBufDataSet = FluidDataSet(s)
~fromBufDataSet.fromBuffer(~toBufB, 0, ~toBufL)

~fromBufDataSet.dump{|d| d["data"].keysValuesDo{|k,v| [k,v].postln}}
//[ second, [ -0.30000001192093, 4.4000000953674 ] ]
//[ first, [ 1.0, 0.23000000417233 ] ]

~toBufB.write(~pwd+/+"test/b.wav", headerFormat: "WAV", sampleFormat: "float")
~toBufL.write(~pwd+/+"test/dl.json")

~loadedB = Buffer.read(s, ~pwd+/+"test/b.wav")
~loadedL = FluidLabelSet(s);
~loadedL.read(~pwd+/+"test/dl.json")

~loadedD = FluidDataSet(s);

~loadedD.fromBuffer(~loadedB, 0, ~loadedL)
~loadedD.dump{|d| d["data"].keysValuesDo{|k,v| [k,v].postln}}
//[ second, [ -0.30000001192093, 4.4000000953674 ] ]
//[ first, [ 1.0, 0.23000000417233 ] ]
// woo it works !!!!

Now exactly the same example but with a larger dataset that does not work…

// make large data set
~initalDataSet = FluidDataSet(s);
b = Buffer.loadCollection(s, [1, 0.23], 1)

r = Routine({
	1000.do{|n|
		b.set(0,  {1.0.rand}.());
		b.set(1,  {1.0.rand}.());
		Server.default.sync;
		~initalDataSet.addPoint(n.asString, b);
	};
	"done".postln;
})

r.play

~initalDataSet.dump{|d| d["data"].["504"].postln}
// [ 0.89565253257751, 0.23000000417233 ]

~toBufB = Buffer(s)
~toBufL = FluidLabelSet(s)
~initalDataSet.toBuffer(~toBufB, 0, ~toBufL)

~fromBufDataSet = FluidDataSet(s)
~fromBufDataSet.fromBuffer(~toBufB, 0, ~toBufL)

~fromBufDataSet.dump{|d| d["data"].["504"].postln}
//[ 0.89565253257751, 0.23000000417233 ] 
// right so writing to a buffer and reading works fine

~toBufB.write(~pwd+/+"test/b.wav", headerFormat: "WAV", sampleFormat: "float")
~toBufL.write(~pwd+/+"test/dl.json")

~loadedB = Buffer.read(s, ~pwd+/+"test/b.wav")
~loadedL = FluidLabelSet(s);
~loadedL.read(~pwd+/+"test/dl.json")

~loadedD = FluidDataSet(s);
~loadedD.fromBuffer(~loadedB, 0, ~loadedL)

~loadedD.dump{|d| d["data"].["504"].postln}
// [ 0.225949883461, 0.23000000417233 ]
// ehh???!?!?

Any ideas? Or is there a better way to do this?

Jordan · August 6, 2022, 10:31am

Okay I’ve noticed something else here that leads me to believe there is a bug here…

The second dimension of the data is always stuck at 0.23000000417233 after loading it from the file. It is as if it is reading too few indexes.

Looking at the fluidLabel set file you get something like this…

"771": ["771"],
"772": [ "772"],

… but shouldn’t there be 2 indexes here? I’d expect…

"0": ["0"],
"1": ["0"]

…ie, two indexes in the buffer belong to a single data label? Or perhaps

"0": ["0"],
"2":["1"]

Is this a bug or am I just going about this the wrong way?

Jordan · August 6, 2022, 10:52am

Okay here is more info… someone is just gonna tell me about a really simple solution (hopefully) and make all this meaningless! …

~toBufB.loadToFloatArray(action: {|a| ~array = a })
File.use(~pwd +/+ "test/arr.json", "w", {|f| f.write(~array.asCode.replace($,,",\n"))});
~loadedA = File.readAllString(~pwd +/+ "test/arr.json").compile.()


// so for the 80th data point I'd expect this to look at the 160 & 161 index in the file
~fromBufDataSet.dump{|d| d["data"].["80"].postln}
//  [ 0.21962189674377, 0.29548144340515 ]
~loadedA[160] == 0.21962189674377
~loadedA[161] == 0.29548144340515
// so this works!

~loadedL = FluidLabelSet(s);
~loadedL.read(~pwd+/+"test/dl.json")
~loadedD = FluidDataSet(s);
~loadedD.fromBuffer(~loadedB, 0, ~loadedL)

~loadedL.dump{|d| d["data"].["80"].postln }
//[ 0.32833027839661, 0.55570936203003 ]#

// These values come from the 1558th and 1559th index in the array...
// I don't understand where it got these indexes from

weefuzzy · August 6, 2022, 11:14am

Hi @Jordan,

Is there a reason not to simply use write and read here (to save / load as json) rather than to/fromBuffer?

Jordan · August 6, 2022, 11:29am

… because I didn’t see those methods hidden in the little drop down menu of the help file…

Thanks so much! Literally spend days trying to figure this out…

However, I am still confused as to why it isn’t working, because I don’t think I’m doing anything wrong?

Jordan · August 6, 2022, 11:48am

Added this to the docs:

yannics · August 6, 2022, 1:43pm

I notice these methods are not mentioned in the source code while they are on the binaries (at least on FluCoMa-SC-Mac.dmg – version 1.0.2+sha.2ca6e58.core.sha.804a3b3).

weefuzzy · August 7, 2022, 10:59am

I guess it will be an unholy combination of two things, depending on the shape of the Dataset:

to/fromBuffer has a transpose argument determining whether the data is laid out frame-wise or channel-wise
SC uses libsoundfile to save buffers, which has a baked-in upper channel limit (of 1024, IIRC). So, depending on the transpose argument, it’s quite possible to end up with a buffer that will be truncated when written to file, which is unfortunate.

Meanwhile, many thanks for being willing to put a PR in for the docs. We’ve actually recently changed how the SC docs are constructed and they now use the same framework as the references for Max and PD, at https://github.com/flucoma/flucoma-docs: I’ll do a review of the way we’re generating for the dataset objects and make sure read/write (which are inherited methods) are more visible.

Jordan · August 7, 2022, 12:36pm

Ah, so the file I changed is actually a generated document?
Or is it that it will be a generated document soon? Only asking in case you wanted some one to contribute?

This file (if it is the source of the generated doc) has a read method, but no write, but neither appear in the final sc doc.

There are also some example on there that demonstrate read and write that don’t appear (at least not fully) in the sc doc.

I think the reason why I overlooked it was that FluidDataObject doc has no documented methods, so I assumed I was suppose to ignore them and consider them private… but looking back at it now I really ought to have figured it out!

weefuzzy · August 7, 2022, 9:08pm

Are you working from a clone of the git repo? If so that would explain why the final help files are confusing: everything is fine in release packages (as far as I can tell), stuff’s still a bit clunky in the source tree.

‘Proper’ helpfiles are generated at build time from those RST files, and some Python magic, so long as you run cmake with -DDOCS=ON. However, we haven’t yet removed the old, hand rolled schelp files from the SC repo, and the generated files only end up somewhere sensible when the install cmake target is run. I haven’t yet worked out the cmake-fu for having a useful working-tree with the newly generated help files. Getting all this smoother and properly documented is one of my next jobs.

We also need to document those base classes, like FluidDataObject, if only to indicate that they’re (uninteresting) base classes.

Jordan · August 8, 2022, 9:24am

Okay so I didn’t have -DDOCS=ON. Would it be possible for this to be made default - I don’t see why you would ever want to build without the docs -, or at least add it to the README.md on github? The only mention of docs is in using manual dependencies for FLUID_DOCS_PATH, when I read the output of cmake -LA I guess I assumed the two were related.

Also, I had to …

pip install docutils
pip install schema
pip install optional
pip install use

Don’t know if they were supposed to happen automatically? Could just be my python - it is such a pain to manage…

Now all the documentation is there including read and write. I guess I was actually looking at the old help docs? Perhaps these could be removed, that way it is obvious something is wrong?

Thanks for the help!

Jordan · August 8, 2022, 9:27am

Oh, I got rid of the PR

weefuzzy · August 8, 2022, 9:36am

Thanks for your persistence @jordan, very much appreciated.

Yes, we will remove the old docs quite soon, and also smooth out this whole process. Up until now, it’s only been us living with the inconveniences!

Dealing with Python is indeed a bit of a pain, especially on Mac. If you find yourself needing to install the dependencies again, there’s a pip requirements.txt file in the flucoma-docs repo, which you’ll probably find in build/__deps/flucoma-docs-src if you allowed cmake to pull in the flucoma-docs code automagically.