I have a dataset with 1.25 million points. It loaded and PCA’d like a boss, but now I need to access the dictionary, and it just doesn’t want to give it to me. Seems to just freeze when I use .dump. Is there a good way to get this? Should I just save the json file and load it myself?
Give it a shot – this is essentially the same process through which you’d get the Dictionary anyway (i.e. via disk), but there are enough moving parts that I guess something is barfing.
Yeah. Loading the json takes about 10 seconds. Dumping becomes too much like the action being performed. I’m going to send you the json file I am using.
The problem seems to be with the language-side performance of our parseJSON method. I can load the JSON file you sent me ok using String.parseJSONFile – it’s not quick, but it gets there. However, via FluidManipulationClient.dump it seems to be taking its sweet time loading the file language-side: SC has been at 100% CPU for about half an hour now
Or it could be that there’s a problem in the recursion that FluidManipulationClient.parseJSON uses, as this still hasn’t completed after more than two hours.
That’s using the built-in JSON parser though. I didn’t have problems doing
d = "normalizedPCA0.json".parseJSONFile;
d["data"].keysValuesDo{|k,v| (k->v).postln};
So, I wonder if what’s choking SC in your example above (not sure I know what the result of the assignment to keys~ will be there: was this originally a collect?
Huh. keysValuesDo actually just traverses the internal array of the Dict (which I had no idea it had) instead of actually accessing the values with keys. Crazy. Still, Dictionary with that many elements is not wise.
Where I am at with this is:
KDTree with 1.25 million elements actually works! But it is a bit pokey some of the time and great some of the time. As in, some inquiries are responded to quickly and others take a hickup moment.
For me, exactness is not the issue. Speed is. So, I have randomly placed the 1.25mill elements into 100 KDTrees. Then I just ping one of the KDTrees at random for my NN, which will hopefully be “good enough” at timbre mapping inputs to complex synths.