No, because you’re still including the time to switch threads in that measurement, which will depend on what else the main thread is doing (in this instance, quite a lot). As such, you need to distinguish between the intrinsic speed difference between your aggregated fluid. processes and descriptors~ (which we think is ~8-10x in this mode of operation), and the operational difference when using these things as quasi real-time objects. Clearly, the latter is what concerns you, but your statement is about the former (or seems to be).
The distinction matters because they have completely different solutions. The intrinsic difference would be addressed by profiling and optimising the algorithm itself, including assessing the actual overhead incurred by multiple STFTs. The latter would be addressed by making the objects safe to run in the scheduler thread, viz. not attempting to resize buffers. Given that this would render some objects useless, it would have to be some sort of option (if it ever happened), and could complicate usage somewhat (by introducing surprising behaviour).
In a practical context, there will likely be loads more stuff going on in the patch that this comparison, so I don’t really think I’d have a stripped back version of it like in this example.
It could also be that I further/additionally misunderstand in that all of the fluid. are internally defer-ing in the same way, so having a bunch of those happening at once in the same patch is the culprit.
OR I just completely misunderstood what you meant.
Lastly, the 0.5ms speed I got from the fluid.descriptors was when I had deferlow in the “wrong” place, so it could be the 8-10x slower number isn’t accurate at all, hence this line of inquiry.
Not really: it will gum up things a bit on the main thread, but mostly to the detriment of screen refreshes (which always seem to be placed at the back of the queue), rather than fluid processes, which are placed at the front.
All the screenshots you just posted show the time for a thread switch + the fluid processing. Below, I have deferred each bang that starts the timer, so what you are now measuring is the time between kicking off the fluid stuff and it finishing, without taking the thread switch in to account. (ignore the unplugged grainPlay, it makes no difference)
You’ll see we’re back where we were (although I note with all the blocks plugged in, there are a lot of outliers in the fluid timings.) This is what I was getting at by distinguishing between an intrinsic difference (i.e. the difference in actual processing time) vs operational (i.e. having to take the thread switch into account).
What this whole discussion suggests to me is that in an ideal world our non-realtime objects would happily support three quite distinct contexts of usage in Max:
Heavy processing, it’s done when it’s done -> delegate to new threads
Medium processing, non-critical timing -> run on main thread, without fear of gumming up max: this is what we currently have
Light processing, critical timing -> run on scheduler thread when available
Work is well under way towards adding support for the first of these. The third, as I said, is tricky, because it just wouldn’t make sense for a number of the non realtime objects not to be able to resize destination buffers.
Ah ok, I understand the intrinsic vs operational thing now.
The operational difference is what I meant with the whole “real-world” stuff, as the intrinsic speed, although handy, doesn’t tell the whole story.
I do like the idea of having multiple contexts, but it strikes me that so much of that appears to be having to jump hoops to avoid the core paradigm(/problem?) of ‘everything is a buffer’.
Because you have to go in/out buffers to manage the data between processes, it makes threading a problematic/contentious issue. Which makes sense if you are working with audio that natively lives in buffers, but that isn’t the case here. The buffer is just being bent to work as a data container, which brings lots of problems with it.
Although I haven’t mocked up a proper example of my intended use case, but I am hoping to do “real-time” sample playback via onset descriptor analysis, so already taking on a 512sample (11ms) delay between my attack and a sample being played back is knocking on the door of perception. Tacking on another 1ms to that doesn’t help, but if the operational (“real world”) delay/latency there is upwards of 20-40ms, this approach (with fluid. objects anyways) becomes impossible/impractical.
Now as @tremblap mentioned, I could always use @a.harker’s vanilla descriptors~ in this context (which I will/would do, though there is some funky behavior as can be seen in the comparison patch above), but I do like a lot of what the fluid.descriptors objects bring ot the table. Plus, this sort of idea will make up the core of real-time matching/replacement/resynthesis in the future, with the 2nd toolbox. So this isn’t a problem that is unique to my intended use case, it appears to be a problem that will be present in many use cases (that at all involve real-time applications).
So this circles back to the core architecture/paradigm decisions, and how they impact scalability and application in “real world patches”, similar to what @jamesbradbury is going through with his patch trying to analyze multiple slices from an audio file. A seemingly simple use case, which brings with it a massive amount of overhead and conceptualizing due to how things are structured.
Not in this particular instance, no. (this phrase might be a refrain throughout the following ) You have two separate gripes, and given that they have different solutions, it’s still important to keep them distinct. ‘Real-world’ here means using these objects in real-time, with overdrive enabled because you want tight timing from Max. That throws up considerations that are quite distinct from, say, large scale offline processing.
Respectfully, that wasn’t so much an idea as a statement of the possible roles our non-realtime objects play. I think we differ on how much of a mainstream use pattern using such things as real-time with high timing expectations might be, but I was trying to indicate that it is a use pattern we take seriously, whilst emphasising that there are others that we also have to support.
Using buffers like this is trading off one set of difficulties for another. On the upside, they are accessible from every ‘domain’ in all our host environments (i.e. messages, signals and jitter in Max), and they scale well. Part of the experiment here is to try and develop a more approachable paradigm; descriptors is powerful, but very complex to use, likewise MuBu .
Meanwhile, of course, figuring out if that set of upsides is worth it relative to the varied kinds of thing we want to do in practice is exactly the collective work we’re engaged in here. That will involve some head banging, inevitably, and clearly part of that work is developing the right kinds of supporting scaffolding to make things simple: knowing what those things are (or might be), and what counts as simple is exactly the kind of discoveries that come from actually using the things in their raw-ish state, like this. Of course, it may be that it can’t be made to work in the end, and we have to go back to the drawing board, but I feel it’s still somewhat early to consider junking the whole set of design decisions just because we don’t (yet) have the abstractions needed to make work completely painless.
This feels like a different set of problems, to do with performance rather than ergonomics. The first thing to note is that, irrespective of the particular toolset, you’re asking a great deal of any machine listening system to try achieve analysis and response within a perceptual fusion window (although I think you’re pessimistic about how short that window is for audiences in halls, rather than critically listening drummers). We’ve discovered that transitioning between the scheduler and main thread is adding considerbly more overhead than the process itself. This would tempt me to simply turn off overdrive in this particular case, although I acknowledge that this decreases timing certainty in general.
It will be an aspect of toolbox 2, but there are many other things to consider there (including developing a quickly-query-able data structure). Real-time is challenging in this particular case of analysing different sized chunks of buffer on the fly for features and statistical summaries, but rather than signalling the awfulness of the buf versions of the objects, it could point to the desirability of adding a new behaviour to the signal rate ones (e.g. a fluid.stats~ that outputs summary statisics between clicks in a secondary inlet; being able to choose (in Max) between signal and list outlets for certain control objects, etc.)
Yeah totally. I don’t want to seem like I’m (needlessly (and superfluously)) busting balls here. I’m just pushing at the edges of the existing paradigm/architecture, and offering thoughts and solutions(/problems?) as to how it can be made to work better.
There are lots of decisions that I don’t understand, but I’m rolling with it and trying to build the things I want to work with it, but that doesn’t always lead to somewhere that “works”. [So far, almost every avenue of exploration has led to a dead end (barring the CV thing, which I want to explore further still). That’s ok. I’m still playing and learning the tools.]
They scale in length, but not quantity. You can have an arbitrarily long buffer, but you can’t have (without great hassle and messiness) an arbitrary amount of buffers. Sure, you can use a single ‘container’ buffer and bufcompose~ everything into it, but then you need a secondary data structure, for your primary data structure, to know what was where in the mega-buffer.
It’s not worth beating on that drum for too long/hard though, as my thoughts on the buffer-as-data-container are well known at this point!
Indeed, but I’m not worried about the listeners (nor halls with listeners in them!), it’s my playing and “feel” that I’m concerned with. If it hit a drum and then get a sample playing back 20-40ms later, it doesn’t make performative sense. 11ms (or rather, 512 samples) is a “happy middle ground” where I can still feel/hear it, but it would be worth it if it worked well and was more accurate, hence my resistance in this thread to anything that would be slower than that.
In the future, once there are (more sophisticated) querying/playback tools, I’ll probably do something ala multiconvolve~ where I analyze 64samples (maybe even 32), then play back a transient that matches that immediately, while I am then analyzing the next size up, and play the ‘post-transient’ from another sample, then the next bit onward, etc… “stitching” together a sample as quickly as possible, and as accurately as possible. That would be an ideal implementation for this idea/use case.
So even with a multiconvolve~ approach, the first tiny “transient” analysis window could still take the intrinsic latency + the operational latency, regardless of how tiny the window was (32 samples + 20-40ms?).
That would be fantastic. I don’t know what that would mean in terms of analyzing a specific (sample accurate) onset (window) though. Plus you lose all the time series info, and potentially run into syncing problems between the different fluid.descriptors objects as well.
It seems like, fundamentally, I’m in between the buf and the realtime objects, in a way that neither is built for the use case(s) that I’m looking at.
I thought fluid. didn’t play nice with polybuffer~s after the alpha02? Or did I misunderstand that.
Again, I could be misunderstanding this, but if I use @fft settings 512 64 in something like fluid.loudness~, I would be restricted to a temporal accuracy of my hopsize (in this case 64), so if I’m interested in analyzing example 512 samples, I will potentially have +/- 64 samples on either end of it as well.
Whereas with the buf counterpart, my temporal accuracy is 1sample, and I can analyze exactly the 512 samples (or whatever) I want.
Unless what your theorizing would mean that fluid.onsetslice~'s “fft clock” (is that a thing?) would only ‘fire’ when it recelives a click~ at it’s input, and then it would analyze the last n amount of samples with the given @fftsettings.
Plus, if there are diff fft settings (and potentially hopsizes) between fluid.loudness~ and fluid.pitch things get messy again in trying to resync that data together.
But yes, it one/could be trading one set of problems for another.
tl;dr The difference from alpha 2 was that fluid objects don’t know about polybuffer~s, they only know about buffer~s, but this has no bearing on whether you can use polybuffer~ in your Max code to manage large / dynamic quantities of buffer~s that you pass into fluid processes.
Back in the olden days, I had thought it would be neat if people could pass into a fluid process either the name of a buffer~or the name of a polybuffer~ (a collection of buffer~s) so that, for instance, all your NMF components end up in separate buffer~s automagically. However, internally, the downsides far outweighed the benefits because the objects are so different. In particular, it proved beyond me to get stable behaviour from objects that needed to access data in the audio thread with this scheme, so I axed it.
All that means, though, is that fluid objects, like all other Max objects that deal with buffers~ only understand buffer~ objects, it has no effect on whether people use polybuffer~ in their patching. As far as fluid.x is concerned there is no difference between a ‘standalone’ buffer~, mybuf and one inside a polybuffer~, say mypolybuf.1.
So, for these things where one needs to throw lots of buffers around, if it’s easier to programatically control the creation of buffers~, go ahead and use polybuffer~
Oh right. So if I set a (single buffer~ inside a) polybuffer~ as output from a fluid. object (say @features mybuffer.1) that would work correctly?
And similarly for source/destination buffers from bufcompose~?
So effectively you could read/write from polybuffer~s, but not address it as an aggregate?
On that note, will something like bufcompose~ automatically populate/size a .1 single buffer~ inside a polybuffer~? Or would it involve “manually” creating the .1 buffer and sizing it and thenbufcompose~-ing into it?
Well, fluid.loudness~ doesn’t have fft settings, but your general point remains that your effective sampling rate with a feature extractor is a function of the hop size, yes. However, the temporal accuracy between the signal and buffer versions is no different: the difference is in how much control one has over submitting the portion that gets analysed (which is pretty much what you said, I know).
rambling aside about buffers (click to reveal)
Even then, the things don’t behave all that differently (at the moment): the reason you get more frames in the buf versions than you expect is because I also return ‘padding’ frames where the beginning of the sample to analysed is lined up with the middle of a window. This will possibly change for descriptors (i.e. we’ve been talking about it) because whilst the current behaviour is correct for things that you’d want to resynthesise later, it makes less sense for descriptors.
The equivalent issue in the signal domain is where time-zero is, and how that lines up with the beginning of a sample block of interest. As it stands, time-zero is when you turn DSP on, and we have no further say in how moments of interest line up with analysis windows. I was imagining that click inputs could be used to essentially reset time-zero, but have given absolutely no thought at all to how difficult / disruptive this would be.
But let’s imagine it’s a thing: you have some Rod-grade onset detector, and this emits spikes on events, these spikes could then be used to align the ‘re-starts’ of a bunch of feature extractors, and to some kind of summary object (e.g. a signal rate stats that doesn’t currently exist). At the point that this produces output, you might be interested either in message or signal domain output depending on want you want to happen next. If it’s some kind of query, probably a message.
In my mind, at least, this seems pretty straightforward in terms of alignment and causality, but maybe I’m missing something.
So, totally ignoring actual fluid stuff for the moment, if I were imaging a featurer-y thing in the signal domain, where I really cared about timing, it might look like this:
Obviously, like all MSP things, this scales poorly with the number of features, though mc.* can probably alleviate that from 8 onwards, or framelib, which was was quite literally made for this sort of game.
Should be exactly the same as using any other buffer~, i.e. bufcompose~ will resize, populate etc. polybuffer~ exerts minimal ownership over its contents, beyond naming and offering controls for adding etc.
Yeah that makes sense, and doing this pseudo-realtime/JIT stuff is just something I nabbed from @tremblap as a way to being more precise with analysis frames/windows for realtime purposes. But it is actually a real-time process that I’m after.
Something like your (“deep fake”!) screenshot would work well, but is pretty different from a conceptual standpoint to how things presently work in the fluid.-verse.
And if there’s a click~ triggering it, it could (hopefully) theoretically not be limited to a hop’s worth of resolution and also account for different @fftsettings per descriptor type.
I remember ages ago asking for something similar for fluid.nmfmatch~ where you could be very precise about returning a single frame of analysis based on an onset.
There would be tons of uses for this “real-time-but-only-when-you-ask-for-it” type workflow, which would alleviate all the potential issues/problems with the buf stuff.
Totes, that would be great. And obviously makes lots more sense for real-time stuff.
Not really. You are after a discrete time - SC has demand-rate concept, which is what you are after. RT is a stream concept, and Demand is accurately whenever you want, as fast as possible. It is actually nearer NRT for me…
TLDR for all the posts above, but some quick thoughts.
2 - the deferring is all about resizing buffers/nothing else, but it is brought about here by the use of storing buffers as an intermediary - the real time objects don’t/won’t have this issue.
3 - from an engineering perspective the infrastructure in the client layer doesn’t care what the buffer it is writing to is (it’s an abstraction of a buffer that could write to any kind of structure you like), and it might in fact represent anything, so the design implications of being able to output to something other than a buffer would remain at the wrapper/environment level. That is to say it is technically possible to get all the buffer analysis objects to output directly to some other format in max without having to touch anything within the core code. Whether that is a design that might be considered is another matter.
4 - the idea that turning overdrive off will provide an answer is for me quite problematic - I don’t think it is generally viable on a retina screen - see next point…
5 - yes graphics might get put at the back of the queue, but even if they do then once they start processing you have to wait for them to finish to get the next thing to happen. That is why the times are erratic without the defer low (which is giving you the operational time). You’ll get noticeably better performance by opening Max in low resolution mode, but the timing for events on the main thread when hopping between threads is basically an unknown - bear in mind also that anything you trigger off that is still low priority, or you have to up the priority again - not nice. It’s sort of OK for response to triggering general events with a low rhythmic tightness, but it won’t cut it for musical timings.
Yes. Somewhere in the tl;dr is acknowledgement that this mode of use is not (yet/ever) accounted for in the design of the non-real-time objects
Somewhere further in the tl;dr is me wondering if it would make sense to consider how the real-time objects could be augmented to better support the specific type of thing that Rod is attempting, but in the signal domain. You might have good reasons for thinking this is a naff idea though.
True, but there are cans of worms (how is the mode specified? different input and output container types possible? how do we handle intrinsically one dimensional containers (i.e lists)? how could this facility replicate across different hosts?)
/ 5. Thanks for the info. My only response is to shrug and say depends on the music and depends on the patch. I have a piece that currently runs without overdrive (otherwise it deadlocks), and that is fine for those particular musical needs. I think I was clear enough that it wasn’t without temporal consequence, somewhere in the tl;dr.
2 - the issue there is that chunks of audio at specific times looks quite like a whole new infrastructure (a la frame lib).
3 - yes - I’m not arguing it would be easy.
4/5 - For me I don’t think that overdrive off answers either the specific set of concerns here or a more generally applicable set of concerns. Of course there may be situations where overdrive off might be viable, but I’d consider them few and far between. I believe the situation with timing on Max7/8 to be far worse with respect to the main thread than older versions (e.g. Max 4) as the graphics are considerably more involved and also the retina resolution comes at significant cost. If rod removed various graphics from his patches the times would most likely improve significantly, but I think that having to consider what is on the screen in terms of speed is considerably more burden than is ideal. Scrolling around a window with a metro into a click~ and overdrive off on a retina screen should give some indication of how bad it can get. Resizing the window can stop the metro indefinitely. For me that is far too flimsy a scenario to be reliable for realtime usage.
With the defer in place, it is “faster” (back in the 0.5ms range), but the gains here are artificial (rather measuring only the intrinsic and not operational latency).
I may mess around with using the fluid.descriptors~ for the database creation, and then (not.fluid.)descriptors~ for the “real-time” stuff as an experiment since in my intended use case I’m not really going for perfect resynthesis, but rather corpus browsing, so consistency between algorithms (although not great) isn’t paramount.