Loading a dict which from json should be the same as reading the file directly

jamesbradbury · August 1, 2020, 11:43pm

JSON has a spec. For “0” it’s a string. [0] is an array, and 0 is what they call a number as JSON doesn’t differentiate between floats and ints. When you say ID do you mean key or value in the key/value pair? If it’s the key then its always a string no matter what.

tedmoore · August 1, 2020, 11:47pm

I meant it could be a value.

This makes sense. So any confusion would be created by person making or parsing the JSON file. I gotta go check my JSON writer and see if I did this right!

jamesbradbury · August 1, 2020, 11:57pm

Then yep, it is even more essential that whatever is reading or writing that gets it correct or the law of least surprise will be violated

tedmoore · August 2, 2020, 12:01am

Sorry this is getting niche… it turns out that my JSONFileWriter does do this properly, but the JSONFileReader that I’ve been using (not authored by me, ahem). Interprets all numbers as a string, which is perhaps why I was leaning that way, because it’s what I’ve been doing…for a while. I’m going to write a new JSON reader that does this right. Thanks to @jamesbradbury!

jamesbradbury · August 2, 2020, 12:04am

Woohoo! You’re making the world a more json friendly place.

spluta · August 2, 2020, 3:39pm

I agree that you can write a reader that assumes a float or integer, but that doesn’t make it right. I just don’t know how you would know, unless you know already, what you are getting. Catch my drift?

jamesbradbury · August 2, 2020, 3:41pm

Not sure I get ya. Do you mean there is no way of knowing what the user intended when they stored the JSON if they did “0” vs 0?

spluta · August 2, 2020, 3:42pm

Right. How do you know if 34.567 is a string or a float when looking at a text file, unless that information is stored in the text file?

Or maybe it is inherent in how it is stored? Maybe I don’t know what I am talking about?

spluta · August 2, 2020, 3:51pm

To be clear, the SC JSON parser was hacked together by someone 10 years ago and isn’t at all documented, so there is a very good chance that it just stinks.

jamesbradbury · August 2, 2020, 4:28pm

It’s inherent to how it’s stored.

If you take a dictionary in Python like:

{
"foo" : 0
}

and deserialise that to a JSON object it will look exactly like that on disk if you use the same indentation scheme. If you load that JSON back into Python it first opens the file and takes the raw bytes as a string then serialises that string to a dictionary. So the answer is that you know what the JSON type is from how it is represented on disk and the representation of a JSON can change depending on the program. However, it is the programs job to correctly interpret the string blob and marshall the result into a type. If the type is a JSON string you would hope that the program turns that into a string in the language of your choice. This is pretty easy as most programming languages have primitives which line up with what JSON provides (array, number, string, true, false, null, JSON object (for representing nested JSON objects or hash tables)). This is also why there are extensions to JSON like the binary version, jsonc (json with comments) (so YAML but worse), and orjson which lets you embed native Python types straight in. People can write their own JSON loaders and dumpers but it’s not truthful to say that its still JSON at a certain point if it doesn’t follow the spec or breaks compatibility in a way that expectations between types are mismanaged.

spluta · August 2, 2020, 4:58pm

tedmoore · August 2, 2020, 5:33pm

Ha, funny that thread is “closed”. I’m going to take a whack at writing one.

What I learned from @jamesbradbury last night (correct me if I’m wrong) is that if something is stored on disk as 0, it’s a number but if it’s stored as "0" it’s a string. A parser should cast them accordingly!

SuperCollider’s parser doesn’t respect this.

For example:

{
"string":"0",
"number":0,
"array-of-strings":["0","1","2"],
"array-of-numbers":[0,1,2]
}

I’m gonna refer to this: JSON

tedmoore · August 2, 2020, 5:53pm

Just to make sure this doesn’t get lost in the conversation, I invite @tremblap and the team to consider this for the SuperCollider implementation. TLDR is to use strings as keys rather than symbols for FluCoMa IdentityDictionary stuff (currently it’s symbols). Unless there’s a specific reason why symbols are preferred? Also, does @spluta have a opinion on this?

For example, here’s a method I added to FluidLoadFolder:

	playID {
		arg id;
		var entry, start_samp, end_samp, sr, dur_secs, rate;
		entry = index.at(id);
		if(entry.keys.choose.class == Symbol,{
			start_samp = entry['bounds'][0].asInteger;
			end_samp = entry['bounds'][1].asInteger;
			sr = entry['sr'].asFloat;
		},{
			start_samp = entry["bounds"][0].asInteger;
			end_samp = entry["bounds"][1].asInteger;
			sr = entry["sr"].asFloat;
		});
		dur_secs = (end_samp - start_samp) / sr;
		rate = sr / buffer.server.sampleRate;
		{
			var sig = PlayBuf.ar(buffer.numChannels,buffer,rate,0,start_samp);
			sig * EnvGen.kr(Env([0,1,1,0],[0.03,dur_secs-0.06,0.03]),doneAction:2);
		}.play;
		^dur_secs;
	}

You’ll notice my goofy if statement in there, but that’s because If I’ve just created the Loader and it’s only been in memory the keys are symbols (because that’s what Flucoma makes them by default), however if I load the index from disk, the keys will be strings!

Also, you realize that I also made the Loader’s index settable (var <> index;) so that I could load an index from disk and then store it with its buffer.

jamesbradbury · August 2, 2020, 5:58pm

I’m not wizard at C but it would likely be worth pulling an existing library and wrapping it up. I use this one which is C++ but has a python binding. It is WAY faster than the default implementation.

spluta · August 2, 2020, 6:05pm

Strings seem fine in SC. I don’t know if this is something that has changed over time, though. The help file for Dictionary still says: "You must only rely on equality for the keys (e.g. symbols are ok, strings not). "

Does FluCoMa use Dictionary or IdentityDictionary? I think this is important. Strings won’t work in the IdentityDictionary:

d = Dictionary.new;
d.put("abc", 10);
d["abc"]


a = (); 
a.put("foo", 2.718);
a["foo"]

tedmoore · August 2, 2020, 6:27pm

Ah, this is a very good point. It turns out strings don’t work as keys in IdentityDictionary. It turns out IdentityDictionary matches by === and Dictionary matches by ==, which makes IdentityDictionary faster.

Flucoma uses IdentityDictionary. However when I load an index from disk it becomes a Dictionary…with strings as keys.

This from the “Symbol” helpfile:

A symbol, like a String, is a sequence of characters. Unlike strings, two symbols with exactly the same characters will be the exact same object. Symbols are optimized for recreating the same symbol over and over again. In practice, this means that symbols are best used for identifiers or tags that are only meaningful within your program, whereas you should use a string when your characters are really processed as text data. Use symbols to name things, use strings for input and output.

Good uses of symbols include symbolic constant values and Dictionary keys.

Since these files are going to be I/Oed, I think it suggests to use strings, therefore Dictionary
Unless there’s a way to load from disk (like a json) into an IdentityDictionary. If one is already parsing a JSON, it’s kind of trivial if it is put into a Dictionary or IdentityDictionary. Maybe I’ll try to include a flag for that.

tedmoore · August 2, 2020, 9:17pm

Found something I could fork and modify. It will return an IdentityDictionary by default (with keys as symbols obviously), but also added a “keys_as_strings” boolean flag that will make it return a Dictionary with…“keys_as_strings”.

tremblap · August 3, 2020, 7:29am

This is some @weefuzzy magic and I’m sure it is very considerate (it mostly is) but I remember a discussion with @groma on this and the value of one over the other…

one thing to check: in Max and SC our formats are directly loadable in native Dicts and vice versa. That allows the dump message under the hood to use (in SC) - did you have a look at the dump method in the class def?

weefuzzy · August 3, 2020, 9:00am

We have a slight inconsistency. I used IdentityDictionary in FluidLoadFolder et al, just because (well, I’d just been using them for the language side cache of DataSet/LabelSet, where I think they do make sense). Meanwhile, our data objects call use Dictionary (and produce string keys, but should cope with symbols).

I will possibly move over to using plain Dictionary in the utils, because it’s easier to deal with.

tedmoore · August 3, 2020, 5:17pm

Ooh, look at that json parsing code right there!

Yeah I always load directly into the FluCoMa objects when I can–the one place where I think one can’t do this is with an index from a FluidSliceCorpus. That I’ve been saving as a json. That’s where the symbol vs. string key has been a hiccup.