Fluid.dataset~ json save sorting?

I don’t know if this has always been the case, but when printing my the contents of my fluid.dataset~ I’m seeing this kind of sorting:

rows: 1715 cols: 21
1     7.5122   -18.776   -12.415       ...  -0.27735     87.31   -53.293
10     1.5144   -18.605    -9.722       ...   0.38857    93.144   -56.418
100     1.0442   -10.104   -9.5277       ...   0.35186    77.097     -33.9
997     4.6128   0.61793    2.3301       ...   -1.5062     95.02    -47.25
998      5.323  -0.35285   -2.2085       ...  -0.36513    86.027   -42.957
999      3.578    -2.352   -2.4126       ...   -1.4234     83.26   -47.525

Ok, looking at the .json file, and it’s in this silly order too.

Tracing backwards in my process I see that the original fluild.dataset~ that created this thing applied this sorting on executing the write command.

This is the fluid.dataset~ print message before writing:

rows: 1715 cols: 21
1     7.5122   -18.776   -12.415       ...  -0.27735     87.31   -53.293
2    -1.3857    -16.77   -9.3112       ...  0.071646    86.881   -39.773
3   0.090661   -13.751   -2.7299       ...  -0.24238    86.861   -26.877
1713    -11.648   -19.998    4.0768       ...  -0.37862    98.319   -24.806
1714    -7.7397   -17.445   -1.0962       ...  -0.43559    92.834   -30.995
1715    -10.052   -18.663   -1.1894       ...  -0.99487    93.784   -30.296

Is this is a byproduct of .json-ing?

It pretty severely limits the usability of the print message if you’re seeing this weird snapshot of the data, rather than the start/end.

The reason I ran into this issue is when trying to make a semi-generalizable 2d visualizer, I kept getting a fluid.dataset~: Point not found error, so I went to see if I needed to offset my uzi counting only to discover that I can’t see where this dataset ends with print.

It has. In this case you are indeed seeing it as a byproduct of the json round trip, but it’s the case i like [dict] and json objects – and associative arrays in general – there’s no inherent ordering of dataset rows by ID / key (which are strings, in any case, so their ‘natural’ ordering would be that annoying thing where 10 comes before 2 etc).

If you know you’re always using integer IDs, and you want to preserve order, then maybe dumping to a [dict] and then pushing the [dict] to a [coll] would work (haven’t tried: [dict] may well destroy the order in the process).

print is really just meant to be a quick sanity check diagnostic to make sure that the dataset is the size one thought it was and that there isn’t total junk in at least some of the points. When you know that you’ve used integers as IDs, you can just use size into get to see what the ‘last’ point has.

If it’s any comfort at all, I see this biting people on the arse (me included) everywhere that [dict]-like structures are used (incl. SC and python as well as Max). However, the point of them is that they are much more flexible than straight arrays, because the IDs aren’t coupled to the size of the dataset, so will remain as specified even after items may have been removed with DataSetQuery or deletePoint.

I guess before I never used print after I had loaded files, only when initially making them. Given that the order isn’t random, I suppose it’s “alphabetical” in terms of the order that it’s putting them in?

It’s not the end of the word, just a bit peculiar.

There are cases where what’s at the extremes of the fluid.dataset~ is important (like if something is fucked along the way and the upper half is filled with zeros, which has happened to be before), so this way gives you no (easy) way to make sure the fluid.dataset~ is filled with actual information from beginning to end (unless you happen to have 99, 999, 9999, or 99999 entries exactly).

Is there a way to change the, um, order of operations in terms of alphabetizing? A quick google shows that there are some standard ways this stuff is handled.

It seems strange but it is the usual way things are ordered. Same problem in python and not a “simple” native way of dealing with it. Sorting can be expensive :slight_smile:

1 Like

There is no way to change the way JSON behaves in this regard, as its representation for key-values pairs (‘objects’) is explicitly unordered in the JSON spec. Be reassured, this isn’t because we don’t know how to sort things in general :stuck_out_tongue_winking_eye:

In any case, without changing the way the JSON is structured, there’s nothing to be done to preserve ordering in this case.

TBH, that use case feels a bit stretched – you want print to confirm assumptions about your data after it’s been to disk and back. ISTM that a programmatic solution would be more robust here:

1 Like

Hehe, yeah I get that.

I guess that the expectation given the layout of print is that you see the first/last three columns, and first/last three rows, whereas the rows are a semi-arbitrary snapshot of the data rather than the extremes.

That code is super handy for pulling out zeros, but there may be NANs in the extremes or some other numerical aberration at “the end” of the data. I get that JSON doesn’t keep track of that, but I would have presumed that there was some other index for the order in which items were played into the fluid.dataset~ in the first place (which I understand may also not be in numerical order).

Even if the data structure can’t be preserved as such, perhaps the print command could go a step further and include a more robust numerical sorting for display purposes (e.g. 1, 2, 3…9, 10, 11, apples, bananas, carrots, etc…). Since the print command is (presumably) only used every now and then, even if that kind of sorting isn’t fast, it wouldn’t really matter much.

I guess what I’m saying is that for associative data structures like DataSet there isn’t a concept of first and last rows: it’s a happy coincidence that the storage order matches the insertion order for an instance that hasn’t been read back from file. I’m not sure I agree that the effects of doing a sophisticated sort for printing wouldn’t matter much, at least for bigger DataSets. I’m dealing with stuff with tens and hundreds and thousands of points at the moment, where I think the overhead of sorting would be pretty noticeable. It’s also not clear to me that there would be a sorting function that could really guess what someone meant in all instances.

The NaN thing is a problem – and one I’m currently dealing with in my own patch. Really, we don’t want NaNs to get into the JSON at all, if we can avoid it (which is tricky, in the general case). AFAIK, the main way they sneak in at the moment is with the various normalizing objects and columns that have a range of 0. I think we currently have open issues both for what to do with NaNs that we do encounter in the JSON (because this could come from anywhere), and what to do for 0-range columns.

Meanwhile, I’d still opt for a programmatic way to deal with this rather than eye-balling, and unfortunately I don’t see an especially beautiful way of checking the range of a column yet, except for fiting to something like standardize, dumping and checking for 0s in the std key (or whatever the equivalents would be for the other objects).

1 Like

Along the same theme, if you fancy a laugh, I’ve just lost an embarassing amount of time coming to the realisation that with polybuffer~ doing writefolder followed by readfolder doesn’t get you back to where you started (because readfolder, unsurprisingly-in-retrospect loads in OS (lexical) order).

So, if you happened (say) to be relying on the indexing in polybuffer~ to map to IDs in a dataset, strange things might happen.


That’s good to know!

I did know about the readfolder thing, which mean I’ve been zeropadding all my numbers to be 001, 002, etc… so they load correctly into polybuffer~s. Didn’t know about the writefolder thing though.

writefolder's very convenient, except that it doesn’t pad its numbers in the file names, so if retaining the order matters, you end up having to do acrobatics :sob:

1 Like

Yes, this has pissed me off time and time again for doing any kind of batch processing. I’m sure the logical conclusion to Max is to just use a single js object in the middle and scripting names.

1 Like

This was my ‘solution’. I will not be taking questions at this time.


That’s weird i have this exact same abstraction on my computer…

1 Like