Weird behaviour when loading a large(ish) dataset from file (supercollider)

So I’m trying to cache a dataset to disk as it doesn’t change and I often need to restart the SuperCollider server&interpreter.

Not too sure what the correct approach is, I couldn’t find any documentation,
but essentially I using the toBuffer method, making a buffer and a label set, writing both to file then reloading them. I’ve tried different file formats, doesn’t seem to make a difference.

This works for small data sets but not for large and I do not know why…
first a small example that shows a working small dataset.

— you will need to set ~pwd to a directory —

FluidDataSet.version = Fluid Corpus Manipulation Toolkit: version 1.0.0+sha.3fd541f.core.sha.f86443a
Manjaro linux - all built from source.


~initalDataSet = FluidDataSet(s);
b = Buffer.loadCollection(s, [1,0.23], 1)
~initalDataSet.addPoint("first", b)
b.setn(0, [-0.3, 4.4])
~initalDataSet.addPoint("second", b)


~initalDataSet.dump{|d| d["data"].keysValuesDo{|k,v| [k,v].postln}}
//[ second, [ -0.30000001192093, 4.4000000953674 ] ]
//[ first, [ 1.0, 0.23000000417233 ] ]

~toBufB = Buffer(s)
~toBufL = FluidLabelSet(s)
~initalDataSet.toBuffer(~toBufB, 0, ~toBufL)

~fromBufDataSet = FluidDataSet(s)
~fromBufDataSet.fromBuffer(~toBufB, 0, ~toBufL)

~fromBufDataSet.dump{|d| d["data"].keysValuesDo{|k,v| [k,v].postln}}
//[ second, [ -0.30000001192093, 4.4000000953674 ] ]
//[ first, [ 1.0, 0.23000000417233 ] ]

~toBufB.write(~pwd+/+"test/b.wav", headerFormat: "WAV", sampleFormat: "float")
~toBufL.write(~pwd+/+"test/dl.json")

~loadedB = Buffer.read(s, ~pwd+/+"test/b.wav")
~loadedL = FluidLabelSet(s);
~loadedL.read(~pwd+/+"test/dl.json")

~loadedD = FluidDataSet(s);

~loadedD.fromBuffer(~loadedB, 0, ~loadedL)
~loadedD.dump{|d| d["data"].keysValuesDo{|k,v| [k,v].postln}}
//[ second, [ -0.30000001192093, 4.4000000953674 ] ]
//[ first, [ 1.0, 0.23000000417233 ] ]
// woo it works !!!!

Now exactly the same example but with a larger dataset that does not work…

// make large data set
~initalDataSet = FluidDataSet(s);
b = Buffer.loadCollection(s, [1, 0.23], 1)

r = Routine({
	1000.do{|n|
		b.set(0,  {1.0.rand}.());
		b.set(1,  {1.0.rand}.());
		Server.default.sync;
		~initalDataSet.addPoint(n.asString, b);
	};
	"done".postln;
})

r.play

~initalDataSet.dump{|d| d["data"].["504"].postln}
// [ 0.89565253257751, 0.23000000417233 ]

~toBufB = Buffer(s)
~toBufL = FluidLabelSet(s)
~initalDataSet.toBuffer(~toBufB, 0, ~toBufL)

~fromBufDataSet = FluidDataSet(s)
~fromBufDataSet.fromBuffer(~toBufB, 0, ~toBufL)

~fromBufDataSet.dump{|d| d["data"].["504"].postln}
//[ 0.89565253257751, 0.23000000417233 ] 
// right so writing to a buffer and reading works fine

~toBufB.write(~pwd+/+"test/b.wav", headerFormat: "WAV", sampleFormat: "float")
~toBufL.write(~pwd+/+"test/dl.json")

~loadedB = Buffer.read(s, ~pwd+/+"test/b.wav")
~loadedL = FluidLabelSet(s);
~loadedL.read(~pwd+/+"test/dl.json")

~loadedD = FluidDataSet(s);
~loadedD.fromBuffer(~loadedB, 0, ~loadedL)

~loadedD.dump{|d| d["data"].["504"].postln}
// [ 0.225949883461, 0.23000000417233 ]
// ehh???!?!?

Any ideas? Or is there a better way to do this?

Okay I’ve noticed something else here that leads me to believe there is a bug here…

The second dimension of the data is always stuck at 0.23000000417233 after loading it from the file. It is as if it is reading too few indexes.

Looking at the fluidLabel set file you get something like this…

"771": ["771"],
"772": [ "772"],

… but shouldn’t there be 2 indexes here? I’d expect…

"0": ["0"],
"1": ["0"] 

…ie, two indexes in the buffer belong to a single data label? Or perhaps

"0": ["0"],
"2":["1"] 

Is this a bug or am I just going about this the wrong way?

Okay here is more info… someone is just gonna tell me about a really simple solution (hopefully) and make all this meaningless! …

~toBufB.loadToFloatArray(action: {|a| ~array = a })
File.use(~pwd +/+ "test/arr.json", "w", {|f| f.write(~array.asCode.replace($,,",\n"))});
~loadedA = File.readAllString(~pwd +/+ "test/arr.json").compile.()


// so for the 80th data point I'd expect this to look at the 160 & 161 index in the file
~fromBufDataSet.dump{|d| d["data"].["80"].postln}
//  [ 0.21962189674377, 0.29548144340515 ]
~loadedA[160] == 0.21962189674377
~loadedA[161] == 0.29548144340515
// so this works!

~loadedL = FluidLabelSet(s);
~loadedL.read(~pwd+/+"test/dl.json")
~loadedD = FluidDataSet(s);
~loadedD.fromBuffer(~loadedB, 0, ~loadedL)

~loadedL.dump{|d| d["data"].["80"].postln }
//[ 0.32833027839661, 0.55570936203003 ]#

// These values come from the 1558th and 1559th index in the array...
// I don't understand where it got these indexes from

Hi @Jordan,

Is there a reason not to simply use write and read here (to save / load as json) rather than to/fromBuffer?

… because I didn’t see those methods hidden in the little drop down menu of the help file…

Thanks so much! Literally spend days trying to figure this out…

However, I am still confused as to why it isn’t working, because I don’t think I’m doing anything wrong?

Added this to the docs:

I notice these methods are not mentioned in the source code while they are on the binaries (at least on FluCoMa-SC-Mac.dmg – version 1.0.2+sha.2ca6e58.core.sha.804a3b3).

I guess it will be an unholy combination of two things, depending on the shape of the Dataset:

  1. to/fromBuffer has a transpose argument determining whether the data is laid out frame-wise or channel-wise
  2. SC uses libsoundfile to save buffers, which has a baked-in upper channel limit (of 1024, IIRC). So, depending on the transpose argument, it’s quite possible to end up with a buffer that will be truncated when written to file, which is unfortunate.

Meanwhile, many thanks for being willing to put a PR in for the docs. We’ve actually recently changed how the SC docs are constructed and they now use the same framework as the references for Max and PD, at https://github.com/flucoma/flucoma-docs: I’ll do a review of the way we’re generating for the dataset objects and make sure read/write (which are inherited methods) are more visible.

Ah, so the file I changed is actually a generated document?
Or is it that it will be a generated document soon? Only asking in case you wanted some one to contribute?

This file (if it is the source of the generated doc) has a read method, but no write, but neither appear in the final sc doc.

There are also some example on there that demonstrate read and write that don’t appear (at least not fully) in the sc doc.

I think the reason why I overlooked it was that FluidDataObject doc has no documented methods, so I assumed I was suppose to ignore them and consider them private… but looking back at it now I really ought to have figured it out!

Are you working from a clone of the git repo? If so that would explain why the final help files are confusing: everything is fine in release packages (as far as I can tell), stuff’s still a bit clunky in the source tree.

‘Proper’ helpfiles are generated at build time from those RST files, and some Python magic, so long as you run cmake with -DDOCS=ON. However, we haven’t yet removed the old, hand rolled schelp files from the SC repo, and the generated files only end up somewhere sensible when the install cmake target is run. I haven’t yet worked out the cmake-fu for having a useful working-tree with the newly generated help files. Getting all this smoother and properly documented is one of my next jobs.

We also need to document those base classes, like FluidDataObject, if only to indicate that they’re (uninteresting) base classes.