Fluid.bufnoveltyslice~ and the 'transbuf'

jamesbradbury · December 9, 2018, 4:17pm

I have questions about the current usage of bufnoveltyslice~. Why does it write out the index to a buffer? Could there be an option to output it as a list instead? It is a little cumbersome to query a buffer after the fact to get your slices out.

Also is the best way to retrieve your number of slices bang -> info~ -> msec to samples ?

rodrigo.constanzo · December 9, 2018, 7:07pm

Yeah, I agree with that.

I imagine the paradigm of “everything is a buffer” is what they’ve gone with, but for something like this where you’re likely going to have <1000 segments (hell, <100), having it output a list, or reference a dict (and/or coll) would make for a simpler use case.

Not to mention if it’s a dict you can then query it by value instead of / as well as index (and/or other database-y queries).

OR

Is the “everything is a buffer” part of a long-term plan for the database format for stage2?

tremblap · December 9, 2018, 9:25pm

Indeed a single paradigm was deemed better - imagine if you don’t know how many slice you will get, how do you know how to switch? Small compromise for amazing flexibility

rodrigo.constanzo · December 9, 2018, 9:48pm

Not sure I follow what you mean?

Can’t you query a dict to know how many entries you have?

Also, as a buffer does that mean you can only query for a specific index and that’s it? What if you want to know if a segment happens in the second half of the buffer? Or a segment that’s >200ms? Do you just have to peek~ a ton of entries and do the maths yourself?
How will you manage (if/when) you have segmentation offsets too, will those be successive entries in a buffer and you have to manage which kind of entry you’re looking at?
What about more complex data structures? Will you just use a buffer for the segmentation and then something else (dicts?) for the full database?

jamesbradbury · December 9, 2018, 11:44pm

I agree here, I think buffer~ is in many ways quite limiting because its quite a flat data structure to begin with and requires you to transform it immediately into something else if you want to query it in more interesting ways. That said, for really basic stuff its pretty nice as you can query everything at audio rate if you wish and it doesn’t require many objects.

Any chance all of the objects that do similar things could have a flag for output type? I know I am asking this like its a super simple request - but I know you and your team are expert coders and so I know everything is possible but more restricted by scope, time and the direction of your project among other things

jamesbradbury · December 9, 2018, 11:56pm

Not sure I follow what you mean?

Can’t you query a dict to know how many entries you have?

I think PA means that if you don’t know how big or what kind of data you’re going to be getting a buffer~ is fairly agnostic and will suit a range of purposes.

rodrigo.constanzo · December 10, 2018, 12:05am

I don’t see how that’s different from a dict (or coll for that matter). I guess you can get into the millions of data points more easily than a database-y format, but unless I’m missing something, you would blow that efficiency with a lack of ability to do complex queries.

jamesbradbury · December 10, 2018, 1:14am

Well yeah,

Simple stuff is great in a buffer + its multi-channel.

As soon as you want to query in more interesting ways you have to start taking it out of the buffer which adds another layer of calculation.

tremblap · December 10, 2018, 8:20am

@jamesbradbury got it right… up until the end at least. Flexibility is important, more than a single user desire for a single type of querying. That is why we stay with the most open type, which can be converted in a plethora of search engines/database entries at the user’s choice (dada.base, coll, saved as audio, multichannel, no maxima, cross-platform, etc) the same decision goes for position in samples instead of ms. Users can convert, when they need to…

…although I think I might have found an issue here. I’ll report back.

rodrigo.constanzo · December 10, 2018, 8:31am

Is it flexible if the data is divorced from symbolic meaning? (i.e. if you have an “everything is a buffer” database, you have literally no idea what kind of data it is, at any positions, or in any order. without some point of reference it would become noise.)

And it would require translation for literally any usage at all.

It’s like saying writing it as a memory blob is the most flexible because it can be turned into anything…

Or like choosing MIDI (flat, context-less, serial) over OSC (symbolic, data hierarchy, etc…) because it’s more “portable”.

Why wouldn’t a json, text, or actual database file be more cross-platform-y? And be more relevant to the type of data being stored.

Will there be tools/abstractions provided to translate the data, or is it up to every user to bake their own for anything they want to do?

And are you planning on using buffers for the stage2 database matching stuff? (will there have to be a meta-data file which references what is what in the “audio” data? if so, what’s the point of having it be separate files?)

(in the concat database stuff I did, I would use two files, one for data and one for meta-data, for speed/performance reasons, but even then, a text file can be multidimensional, with both indices and “columns” whereas a buffer is only 2d)

jamesbradbury · December 10, 2018, 10:55am

Thanks for taking our nagging on board

weefuzzy · December 10, 2018, 4:54pm

Thanks all, great to see some experiences of how this feels in practice. I, personally, don’t feel like this way of doing it is set in stone, but it may help if I explain why we ended up using buffers.

First, we wanted something broadly similar in both Max and SC. Second, we wanted something that would scale. The drawback with lists: in Max there’s a hard upper limit on the size (32768 or so), in SC it would involve an episode of server->language-side transfer, possibly of much data.

I agree that it’s not ergonimically ideal. @jamesbradbury – having a flag to selecr isn’t a bad idea. I can’t imagine that the default case would be for huge numbers markers. I’ll see how tricky it would be.

I think it’s unlikely that we’ll try and use buffers for all our future data represenation needs

jamesbradbury · December 10, 2018, 4:55pm

The list limit is something I had not considered (yet have come up against with great frustration in the past).

Thanks for the clarification!

rodrigo.constanzo · May 26, 2019, 7:39pm

I wanted to bump this thread too, after hearing that when fluid.ampslice~ is working properly, it will write to two separate buffers for demarking ‘onsets’ and ‘offsets’.

That seems like it is going to get messy quickly, and even more so when you have multidimensional data.

Is the “everything is a buffer” thing now finally set in stone?

rodrigo.constanzo:

Is it flexible if the data is divorced from symbolic meaning? (i.e. if you have an “everything is a buffer” database, you have literally no idea what kind of data it is, at any positions, or in any order. without some point of reference it would become noise.)

And it would require translation for literally any usage at all.

It’s like saying writing it as a memory blob is the most flexible because it can be turned into anything…

Or like choosing MIDI (flat, context-less, serial) over OSC (symbolic, data hierarchy, etc…) because it’s more “portable”.

Why wouldn’t a json, text, or actual database file be more cross-platform-y? And be more relevant to the type of data being stored.

Will there be tools/abstractions provided to translate the data, or is it up to every user to bake their own for anything they want to do?

And are you planning on using buffers for the stage2 database matching stuff? (will there have to be a meta-data file which references what is what in the “audio” data? if so, what’s the point of having it be separate files?)

jamesbradbury · May 26, 2019, 7:42pm

I bet you want it in a dict? Just a hunch

I think that everything should be in buffers or nothing - and that its a sane choice for everything to be in buffers because of the flexibility of that data format. You can make an abstraction to push it wherever you want it to go!

rodrigo.constanzo · May 26, 2019, 7:50pm

Hehe, they are legible, but too slow really, for big data. Did lots of testing for the update of C-C-Combine and coll/text was the way to go, by a significant margin, in terms of speed.

So not really sure what’s best, but I do have “architecture” on the brain from the other thread, and was reminded of this.

For me, all my concerns about buffers still holds up in that it lacks context/meaning, and importantly, dimensional scalability. And in playing with it “in context” in the other thread, I can see having to go in and out of buffers constantly adding a significant overhead to everyday processing.

jamesbradbury · June 3, 2019, 2:52pm

What is interesting is I looked back to the start of this thread.

I would say I’m a total buffer convert now which was not the case initially. I’m not sure I can even put my finger on what moved my opinion one way or another. I guess small abstractions to retrieve and query buffers is what has helped…

rodrigo.constanzo · June 3, 2019, 3:37pm

For some things it’s definitely useful (though I’m not convinced of the speed when creating longer processing chains).

It gets confusing when that same dimension is used to express two types of data (like the start/end boundaries and slice points, like the problem in this original thread, which also shows up in @leafcutterjohn’s great patch/abstraction).

I honestly can’t picture the solutions that will have to come when the data gets multidimensional. Hell, even with a single dimension I’m struggling to figure out how to manage long “buffer lists” of stats and descrioptors. When I have a buffer with 30+ frames in it, and each frame is in a different range, means something else, and I never know whether the first channel is 0 or 1.

I guess this will matter less when individual channels don’t mean as much, and it all ends up in the hands of the machine learning algorithms, where they don’t need or care what the numbers are.

What really turned my head though, was to learn that something like the ampslice~ object will have separate buffers (I presume) for onsets and offsets. That seems like a pain in the butt.

tremblap · June 3, 2019, 8:08pm

thanks James for the self-reflection. I’d say it might also be a learning curve of a new paradigm we present, and that @weefuzzy and his minions (I for one) could meditate on a way to make it smoother.

These are for me, in effect, one same type, if you think of the buffer boundaries as slice points, whic in the grand continuum of digital silence, they are

These are, on the other hand, 2 different things. You could need one without the other, and should definitely think about them as different, interrelated things, like life and death. On and off. And other binary states.

rodrigo.constanzo · June 3, 2019, 8:31pm

That are both the same type of information, from a continuum of time, but one is what you asked for from the object, and the other isn’t.

Like looking up the definition of a word, and finding the the letters of the word added to the beginning and ending of the definition.

I understand why having the boundaries are there, particularly in the context of multithreading (in the future), but it’s confusing, and using the same data vector represent two things (boundaries and data).

Ah yes, the toggle object of the forthcoming Max 9:

Or the fluid.buffer~ object, which keeps all the values above and below 0. separate:

But seriously, they are different things but they belong together, particularly that sans context, each one would be indistinguishable from the other (mistaking the offsets for onsets etc…), and unfortunately that is the case with buffers. I need to know what is in buffer before I query it since there’s no way to add symbolic information about what it is.

If everything is a buffer, having them back-to-back in a single buffer is bad too, since, again, you wouldn’t know what is what. So given the choices of a single buffer or two, two buffers is better, but even better than two buffers is zero buffers IMO.