Fluid.dataset~ json save sorting?

rodrigo.constanzo · February 19, 2021, 6:03pm

I don’t know if this has always been the case, but when printing my the contents of my fluid.dataset~ I’m seeing this kind of sorting:

fluid.dataset~: 
rows: 1715 cols: 21
1     7.5122   -18.776   -12.415       ...  -0.27735     87.31   -53.293
10     1.5144   -18.605    -9.722       ...   0.38857    93.144   -56.418
100     1.0442   -10.104   -9.5277       ...   0.35186    77.097     -33.9
       ...
997     4.6128   0.61793    2.3301       ...   -1.5062     95.02    -47.25
998      5.323  -0.35285   -2.2085       ...  -0.36513    86.027   -42.957
999      3.578    -2.352   -2.4126       ...   -1.4234     83.26   -47.525

Ok, looking at the .json file, and it’s in this silly order too.

Tracing backwards in my process I see that the original fluild.dataset~ that created this thing applied this sorting on executing the write command.

This is the fluid.dataset~ print message before writing:

fluid.dataset~: 
rows: 1715 cols: 21
1     7.5122   -18.776   -12.415       ...  -0.27735     87.31   -53.293
2    -1.3857    -16.77   -9.3112       ...  0.071646    86.881   -39.773
3   0.090661   -13.751   -2.7299       ...  -0.24238    86.861   -26.877
       ...
1713    -11.648   -19.998    4.0768       ...  -0.37862    98.319   -24.806
1714    -7.7397   -17.445   -1.0962       ...  -0.43559    92.834   -30.995
1715    -10.052   -18.663   -1.1894       ...  -0.99487    93.784   -30.296

Is this is a byproduct of .json-ing?

It pretty severely limits the usability of the print message if you’re seeing this weird snapshot of the data, rather than the start/end.

The reason I ran into this issue is when trying to make a semi-generalizable 2d visualizer, I kept getting a fluid.dataset~: Point not found error, so I went to see if I needed to offset my uzi counting only to discover that I can’t see where this dataset ends with print.

weefuzzy · February 19, 2021, 11:42pm

It has. In this case you are indeed seeing it as a byproduct of the json round trip, but it’s the case i like [dict] and json objects – and associative arrays in general – there’s no inherent ordering of dataset rows by ID / key (which are strings, in any case, so their ‘natural’ ordering would be that annoying thing where 10 comes before 2 etc).

If you know you’re always using integer IDs, and you want to preserve order, then maybe dumping to a [dict] and then pushing the [dict] to a [coll] would work (haven’t tried: [dict] may well destroy the order in the process).

print is really just meant to be a quick sanity check diagnostic to make sure that the dataset is the size one thought it was and that there isn’t total junk in at least some of the points. When you know that you’ve used integers as IDs, you can just use size into get to see what the ‘last’ point has.

If it’s any comfort at all, I see this biting people on the arse (me included) everywhere that [dict]-like structures are used (incl. SC and python as well as Max). However, the point of them is that they are much more flexible than straight arrays, because the IDs aren’t coupled to the size of the dataset, so will remain as specified even after items may have been removed with DataSetQuery or deletePoint.

rodrigo.constanzo · February 20, 2021, 1:53am

I guess before I never used print after I had loaded files, only when initially making them. Given that the order isn’t random, I suppose it’s “alphabetical” in terms of the order that it’s putting them in?

It’s not the end of the word, just a bit peculiar.

There are cases where what’s at the extremes of the fluid.dataset~ is important (like if something is fucked along the way and the upper half is filled with zeros, which has happened to be before), so this way gives you no (easy) way to make sure the fluid.dataset~ is filled with actual information from beginning to end (unless you happen to have 99, 999, 9999, or 99999 entries exactly).

Is there a way to change the, um, order of operations in terms of alphabetizing? A quick google shows that there are some standard ways this stuff is handled.

jamesbradbury · February 20, 2021, 7:49am

It seems strange but it is the usual way things are ordered. Same problem in python and not a “simple” native way of dealing with it. Sorting can be expensive

weefuzzy · February 20, 2021, 10:45am

There is no way to change the way JSON behaves in this regard, as its representation for key-values pairs (‘objects’) is explicitly unordered in the JSON spec. Be reassured, this isn’t because we don’t know how to sort things in general

In any case, without changing the way the JSON is structured, there’s nothing to be done to preserve ordering in this case.

TBH, that use case feels a bit stretched – you want print to confirm assumptions about your data after it’s been to disk and back. ISTM that a programmatic solution would be more robust here:


----------begin_max5_patcher----------
1787.3ocwas0aaaCE9YmeEDd6QOCdUWJPA1CCEX+FVGBjsoSTqkjmDUaRKZ+
sORdjrcZosjqHcdHlfzVjmuy8ygJe8tYyWU8jrYN5Mn+AMa1Wua1L6RlEl0M
e17hrmVuKqw9ylutpnPVplu.9Nk7Ikc82kWtAU0pPe9w70Ohpq9bCJqVhx1s
C8EYcUS+SrKuTttpsz9XztE2WKaz6ZlJup7dm+h7M1ioZ0G9i3z98prsHubm
TYIMxwE0zQ+p39SHSs9w7xGtuVtVA.liiWhWfD1OIBrYfwVhQ+q4Q91c2Y9X
wj4KZFPFZSlJqQp4N4pGQMUERTipc61kn2kW2nPL5K4XX2rKlClQheXFBlA9
QQmvK3wdlWnIlUxZT0VDFsUyWjarndtCTE4GTwXVjDQIVbQSMCTrmw0e+WMf
fcPXI7CrhHV7DSAwEIYRvpT9YMw8SnpocUiJW0pjnM4qMVlY0OiJyJjtPF1I
xnNQV+pvRpm2KAXMe9B8en+8BZoBxRgVdJnVfyAfS8KvMncYa49r0ezZ29FG
3M5ZL6HmEuiAqftaZ5jfZgroI6A4Og02Om794H8mXXnaFEFXv.GFDvPDLDCC
IvPpc.dLJrWTXuncKB6EE1KJrWTXunvdQg8hB6E76Yvdwf8hA6ErGvV.6.rA
vyqe7w57LJ9JzZ+0jhzjHiTLFTUYXg0yZpe0XUnGjpOJetAsykt5030Y51lI
VayHVnLMcgP1UHG4mCgG8xYw5hwi4XB2lIA1LAvb2yzj8I4l60zn9QtOSopy
Wocn1zg7NnOatTGXbyIN8MGVs1SqRVeurLa0N4Y9xhr86O701u0xn8k6grMa
VWsqsnDgcw1oA27IhayNilZGhimjRkl0qpJcAD7EsPtfrXrfbUV4CWDnbh0y
.fSJu+Su5reecdoKyGQRvkio1fXrDHIaZH7MnPqbgsHeDndPwGQmQhw1Oh2C
s.DrVaMtuRKBQ+NAs2EVEAWNR3zSDjIjvHHyQ4tfGyGwwrl.vvH.JoyfL8nm
cOJPMUKfvKcgUb3EkfJKCS6K78VYSxStM1jII9yl7bEIseWto7cWnzcxkrvn
t14+ApHTHBh+GBMwENiBtpJzbBB.tIZKdFAY6Wb4xgeMIVxtnl5hiiCIL6fa
DzzBbPJx8C4pkkU4MRDAscWUlhQQziAKOkGf8gaW84oSVUm+6SClTcmcamiI
RXLbM3eU61sx5u6LVJ6ZZxHabn9WfGvrp7QAQEX6t17MK65L42QaqpbwGR7A
enWy2poML3gXubJXGfwg.8GD9n+rIqXeyYz9Yw9P6+.tGNnE1Fsh20gKFNHd
xcFvhEdG47D1I8iUDDAackokkM4eQh1zV3z1leEQlYWrgHC1fftNCjXaeWJO
LIYcQv5krlMmfAr1C4x0wZkvBp0kUb.kvp5rxlsU0Et.M8lzN5NvxSsfMML1
pGvoorueCeulLcgXR3MdAz10KFBIJHvUWSjRVav5ae6wtNcUWPHMM77BNzIC
PlGGl7syJ27i7gSQY36ZifXy0tGkrvUAkS.FcCKfRj1kvED9EeCR35r1xD1s
Omqd+1c2FNEGlHUOHUlaPDstZWCZ8NYVsK7S8A9MP1bL1Q6IMh9uR3QP+shC
TQlHrSccB40oPSlElfG8Dd3BVePn6.6o2rrw5ifwAjFge0Jqh9ZXgCQof25k
.UU0K.++0Jqe96tz1e07uwif5pHtQucGrWa7O7RQY2Qy5ujkzT0Vutmf5eOH
PGu9uMxFUdoMqkS9QlKjEcN19XOIyafvfmTO4LoSxTtDhLDlR7AlzA9PzgNo
Xfbl1IIFClX9.ShwnQ3iShLFtmUX5EMhgvjocBSVNYpdcP4DwGbOSIiCiIub
RrQbRdASQiASjW7aJxg6Xry+Wjv95qv41qWfSoGlMQRKcL1ezKQZBNDLGtDx
NB0N6vSTUuQVe3Ui3WmMNFm5DpGDXhw3Tm4CWshwnDJ7QfJSaUG9j7QfJd7H
Tp3I95jFDSQ9f6ItUZD7wD9Pv7kt2fxoXebRiwx0bUOSm6MFKWtWzxGi8D2G
xI9Mi6QFSlKWNZ.wVJKOEeZfJyroRZzQnrdQRKBmrLVSMjjX3Zs3K6m4CZCO
NZaZorPFehDSNsfA037wAQSmZpQbdJnpAEV1owQYSVpRSlJoIf+QG3LxojFi
LcRaL9d9AVqCqASG1h.iAfIZm4CZaPC0jKRa.eiEmB2g0wYS0PcLN3n9HeAq
KA5TTdhSAEZ3UatW4wL6FjxO4kj1gD3ISS0XD7jKpzJvBa0EvqiQGOwNyQ6i
x1u+Sx5lts2RTyKx9PkEHIKrSyKgo1ddMuV9o79euvtRV85GyUx0p1ZKPl+T
Dbo0yKpzrjx17NthlcnORae4L8XsYeFfba66t6a28+TOkWu.
-----------end_max5_patcher-----------

rodrigo.constanzo · February 20, 2021, 12:56pm

Hehe, yeah I get that.

I guess that the expectation given the layout of print is that you see the first/last three columns, and first/last three rows, whereas the rows are a semi-arbitrary snapshot of the data rather than the extremes.

That code is super handy for pulling out zeros, but there may be NANs in the extremes or some other numerical aberration at “the end” of the data. I get that JSON doesn’t keep track of that, but I would have presumed that there was some other index for the order in which items were played into the fluid.dataset~ in the first place (which I understand may also not be in numerical order).

Even if the data structure can’t be preserved as such, perhaps the print command could go a step further and include a more robust numerical sorting for display purposes (e.g. 1, 2, 3…9, 10, 11, apples, bananas, carrots, etc…). Since the print command is (presumably) only used every now and then, even if that kind of sorting isn’t fast, it wouldn’t really matter much.

weefuzzy · February 20, 2021, 1:25pm

I guess what I’m saying is that for associative data structures like DataSet there isn’t a concept of first and last rows: it’s a happy coincidence that the storage order matches the insertion order for an instance that hasn’t been read back from file. I’m not sure I agree that the effects of doing a sophisticated sort for printing wouldn’t matter much, at least for bigger DataSets. I’m dealing with stuff with tens and hundreds and thousands of points at the moment, where I think the overhead of sorting would be pretty noticeable. It’s also not clear to me that there would be a sorting function that could really guess what someone meant in all instances.

The NaN thing is a problem – and one I’m currently dealing with in my own patch. Really, we don’t want NaNs to get into the JSON at all, if we can avoid it (which is tricky, in the general case). AFAIK, the main way they sneak in at the moment is with the various normalizing objects and columns that have a range of 0. I think we currently have open issues both for what to do with NaNs that we do encounter in the JSON (because this could come from anywhere), and what to do for 0-range columns.

Meanwhile, I’d still opt for a programmatic way to deal with this rather than eye-balling, and unfortunately I don’t see an especially beautiful way of checking the range of a column yet, except for fiting to something like standardize, dumping and checking for 0s in the std key (or whatever the equivalents would be for the other objects).

weefuzzy · February 21, 2021, 4:02pm

Along the same theme, if you fancy a laugh, I’ve just lost an embarassing amount of time coming to the realisation that with polybuffer~ doing writefolder followed by readfolder doesn’t get you back to where you started (because readfolder, unsurprisingly-in-retrospect loads in OS (lexical) order).

So, if you happened (say) to be relying on the indexing in polybuffer~ to map to IDs in a dataset, strange things might happen.

rodrigo.constanzo · February 21, 2021, 5:24pm

That’s good to know!

I did know about the readfolder thing, which mean I’ve been zeropadding all my numbers to be 001, 002, etc… so they load correctly into polybuffer~s. Didn’t know about the writefolder thing though.

weefuzzy · February 21, 2021, 5:27pm

writefolder's very convenient, except that it doesn’t pad its numbers in the file names, so if retaining the order matters, you end up having to do acrobatics

jamesbradbury · February 21, 2021, 6:43pm

Yes, this has pissed me off time and time again for doing any kind of batch processing. I’m sure the logical conclusion to Max is to just use a single js object in the middle and scripting names.

weefuzzy · February 21, 2021, 6:52pm

This was my ‘solution’. I will not be taking questions at this time.


----------begin_max5_patcher----------
1131.3ocyXs0aaaCE9Y6eEDBq.qnJzh2jn1aYWJvdqur0GxJBjrYxTfrnftj
5rhte6ijR1QNQ1lPlsdBF1fGJqy46b66P8k4y7RkaD0dfeBbCX1ruLe1LiHs
fY8qm4sNYyx7jZys4UH9rL8AO+tsZDaZLhaj0OsNUlucirUFwpa8pnsxjsM4
hllmJEcJzyC7o9sJSZV92YE2eakXYS2twQv.e.JNFFwYQHJt66HtOfY1BigA
6d.EsqyJTOciQhdVXmN6jpE904y0e4aIZWKpqStW7J3t3OpEU0KjeVTr3WkK
aWKJZpW7yxMfeQVU1Vu357bvGzOkW6OHmk+fwMXmgOtK.6LWvAB3kUhRQwJP
Ro9mQfIdp3j.iX9fn.DDuKpyY7XhOHN76Uf+.nttrJqn4NfJYWo.vapW7lwB
wn3yB6grX3yHOBwXpjdDlbgC4+SNnR73XvkeX35aIjQQvvAXFio9.d3TPL1c
H9AYVwX3M5rBurHFbXKMdrBjnPHlcYqnk4Ok1d2chp+EjjmWNFvoGMPmlTbu
U01TUp7fKLNRUaGLkZ6sdk5jGEqtUYiJccaRSSUVZaSGy1rctgYdh0oBCbBL
BLNNG49Z.qZWWBRA0fk4hjpwbeGtkn9+tyE524N6dLmlS.SQvgoSLlIQZRjj
TGRRVKqZ.WgTeFyWDLsZHDhXJTnwXHdXUDFc5AC9lWCsTlO1HPmn+nu88IYD
JL3kwZV.jMgP8TKc76knFAXYVclrPKMz4ETUJaU.z9pwnXCmNkClZFdhDQ0+
DxuvjLc37fyQwlNPQgFBTZ.GNLkILLT0CN9+mbqzI1VfyObaATPfo34B1XnR
buXSI.9t+RcA+waBtJ9Su6siM1rcsJrMzqZMPiGbQIZ2AeJTCLWxUlBxAi0o
jd33+PpQavNgffngiQqxDhYSA4DGRJZ5mAt92e+6Ae75+727A+vXrizoxN1A
cL2Dgu7GY3NY9JwXiBQBNZdt5PVVEhiHv3AQXBWcPIR7jFjb.nM+Ou7rhW9p
QLVhV99dhZYa0xsleeQL.sS6qD0MYEIM8Lk2rqQGPYgi5qsUO8OiipG9d2y5
rUkpFuM8.ColfzLVAlBCG7dVT0N59DciZMxlmocyrvto64CeocqNBktaeerV
Qlsc0t+grRm4scT+IapQVXp545O6X41GhMAyyWQ3S48CNVZClxfpRvmuXQ5n
folrOzXV4BK8T0QniZoZqgSFbgnZaipMQjpGgYr.8RWXpjSYpziYpJSSaUQD
yz7aK9LKOSiylLqHGjYwsoVwEJJzl7hnta5adiJ7d2yt1NnyqyN0hhzPWPgX
CGBIvUZB8cQSA1vu3BBXhMZh4.EgsoFF4BHoe20mNm2ILdV0uH1EZxJVbmPt
Fb1ij0yJwP7iLR1q2bOCuaB1jxxGEUaeCMFaVM09CRS+ItuYYVQ2RyTwpio9
3t2nCyHIoRM6ciZv61JCN81D1cDMu0RUmth1r9lcJukRklSDTjnN8SYRmiwb
vg4ec9+ABFKW5
-----------end_max5_patcher-----------

jamesbradbury · February 21, 2021, 6:53pm

That’s weird i have this exact same abstraction on my computer…