General purpose vs Specific use case (or: secret sauce, and where to get it)

The discussion in the fluid.transients~ thread was starting to veer quite off topic, and into an area that could probably benefit from its own thread and discussion, so making a new one here.

Starting from my post/prompt of:

This is a bit OT, but there’s been a bit of mention in a couple of threads now about where the generalizability of some of the algorithms break down and/or how often “real world” applications do some kind of contextually appropriate pre-seeding and/or algorithm tweaking. Is that level of specialization and/or fine-tuning in the plans/cards for future FluCoMa stuff?

Like, architecturally speaking, some of the stuff discussed in the thread around realtime NMF matching(and semi-reworking the concept from the machine learning drum trigger thing) just isn’t possible in the fluid.verse~ because there is no step that allows a pre-training/pre-seeding of an expected material to then optimize towards.

It’s great that all the algorithms work on a variety of materials, but form a user point of view, if they don’t do a specific type of material well , the overall range doesn’t really help or matter.

////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

That’s good to hear, as I’ve kind of placed that hope in the “not gonna happen” part of my brain based on various discussions on here and in the plenaries where the “tools would be made available” and anything else “we could code ourselves”.

Pre-seeding, yes. And I did make use of that, but I guess I meant more of some of the stuff discussed in this thread, namely the “Neural network trained on pre-labeled data” step in the graph below.


That’s just one example that I know, and I certainly know enough to know that I don’t know much. But it always seems to come down to some secret sauce variation and/or implementation that isn’t a vanilla take on an algorithm, that makes something work particularly well. (which often takes into account certain assumptions about the material it will be used on)

Maybe some of these thoughts/concerns can end up in a KE category along the lines of what @tutschku suggested in the last plenary, where there are specific settings/examples/approaches that suit certain types of materials, and are presented as such. Which “somewhat” differs from the “here are the 30 parameters that you can play with, they are all intertwined and do very subtle things…have fun!” approach that most of the objects are designed around.

So it’s not a matter of making every parameter available (which isn’t always useful in and of itself (i.e. the still enigmatic fluid.ampslice~)), but rather whole approaches which aren’t (yet) available with the “general purpose” tools/algorithms.

AND

I could just be uninformed/wrong about some of my thinking here. Which even if that is the case, given my level of engagement on here, means somethings aren’t coming across as well as it could.

the problem here is the extant of the diversity of examples. Just look at onsetslice for instance. Changing the metrics changes everything, and the very carefully curated list of audiofiles barely scratch the surface of what creative sound designers might care about…

that is why the industry does blackboxing. Physion barely works, and most database sorters are for small single file percussion, and all concatenative orchestration package I found are complex in interface… and drum trigger people do something you like because they have taken a very, very, very narrow subset of possibilities. What is the ethos of our project from the onset is to give composers more power, therefore more responsibility. As @weefuzzy said, the interface balance is to be tweaked, but we already have curated quite a lot of the parameters… just look at this:

This is a very well curated, yet one level further down than FluCoMa (devised by @groma in a previous life). There is even more open. Then look at this: http://audiostellar.xyz/ or this http://imtr.ircam.fr/imtr/CataRT - both of them assume a lot, and will excel in a single task (the former towards very short sound, the latter towards granulation with semi-perceptually relevant parameters)

===
So where does that leave you:

  • on extreme: if there is a commercial product that does what you want, use it. It’ll very likely be better than anything you’ll do by yourself
  • other extreme (not your case for sure): if you are not afraid of doing DSP, prototype in Python then code in C++, you’ll have a lot of control on a lot of details we already curate.
  • if like me, you are dissatisfied by the level of curating of commercial software and you are ready to experiment with something in Max/Pd/SC, then you are at the right place. But you have to keep expectations are a research project exploring interface, not a service to specific needs. The overall agenda is flexibility of interface, which comes at a double cost of 1) more knowledge needed (hence the KE work of @weefuzzy and all the example and this forum) and 2) more explorations needed and 3) trial and error on all sides.

I’m sure you heard me explain that before, but it is very important that everyone understands the agenda: neither of the extremes in the above list, but a new possibility. For my case, as you will see in the next plenary presentation, the FluCoMa tools do things that are not possible elsewhere, and complement a workflow with many other things.

Happy experimentation!

I get that, but at the same time a curated list of settings/approaches for diff materials is equally useful, given that you’re aiming for the techno-fluent crowd (and not the programmer crowd). I’m assuming that’s in the works, and us being first batch obviously will live through most of the growing pains. But taking the example of fluid.ampslice~, I kind of understand what’s happening with the algorithm since the Max version has been my go-to onset detector for a couple of years now, and after making the thread asking questions, I understand (most of) the parameters, but I’m not using it in any of my patches. It doesn’t make sense to me. (part of that is I haven’t been able to get consistent results across a range of test material, so I don’t feel comfortable that it will behave as I expect in a performance context)

I guess that’s a main part of it for me. What if I want to make a lot of assumptions about the material I’m using, in order to excel in a single task. That often isn’t exposed architecturally.

(so here I’m not exactly sure what I’m asking for other than knowing that with comments like yours, because those approaches are assuming things, it lets them do stuff that isn’t possible a general purpose algorithm(ic approach).)

I know you often use this as a negative thing, but FluCoMa isn’t exactly not a black box either. There is a paradigm that is absolutely locked in, and based on assumptions on how users will use things (a single example is the buffer-based paradigm, which comes with a lot of baggage, is completely blackboxed in that there is no other way of doing things).

I don’t mean to (further) suggest that should change, but just to point out that there is no such thing as a neutral or open system. They all makes assumptions, restrictions, limitations, etc… So to say that FluCoMa is fully open, just means that it lines up with your assumptions (which obviously makes sense, since your driving the project!).

There’s a lot to unpack here, and I’m not going to attempt to bounce off all your points.

There’s a pretty involved coupling between how interfaces can be presented in the most useful way, and being able to articulate useful approaches with a given algorithm for some particular material. For many of the algorithms that are in the decomposition toolkit, the latter (i.e. useful approaches…) is stuff that is still waiting to be discovered. We’re not withholding information; rather, practical information comes through practice, and there isn’t a documented body of that for a lot of these things. This was part of the rationale for making some of you go through the growing pains: a documented body of practice starts to emerge, and helps us improve things in a more informed way.

With that said, something for me to reflect on is that by getting more involved in some of the development over the last 18 months, I’ve neglected an opportunity to work more closely with you in investigating the fit between your artistic goals and the affordances of the toolkit. Something to improve on next time round, where these issues of interface, generality etc. are only going to get more acute.

Both you and @tremblap are right about black boxes: no interface is neutral, but at the same time, boiling things down to minimal curated controls forestalls a great deal of musical possibilities. So, for me, letting it all hang out as a design ethos was / is a point of pragmatism rather than principle. We have to start with something, and in the absence of an informed basis on which to decide what’s important, exposing more rather than fewer things leaves more room for serendipity, playfulness etc. Likewise, given a commitment to supporting a number of different environments, buffers make a lot of sense as a baseline interface that builds in some consistency (but most assuredly creates a number of pain points as well).

Something I’m working on the moment is trying to document underlying modelling assumptions of the toolkit algorithms, as this will at least help us start to manage our expectations about what objects might do well in what contexts, and to get a handle on what some more mysterious parameters might mean. For instance, HPSS assumes that it makes sense to think of your signal as a sum of spiky bits and tonal bits, and (except for silence) it’s always going to find something based on that model, but the results will make less and less sense (e.g. try running cell-f2.aif through it and listening to the percussive output).

Secret sauces:
A thing to bear in mind is that there really aren’t any (or many) actually solved problems in machine listening, so getting anything to work IRL at all generally involves some tweaking, either through pre-/post- processing, tweaks to algorithms and what have you. This goes equally for magical seeming machine learning systems, except that the secret sauce now frequently involves a complex interrelationship between one’s model and one’s training data.

Commercial entities are going to have to grapple with a trade off between what works well-enough to satisfy enough customers vs pleasing a wide range of people, and they certainly aren’t going to divulge any more detail than they can get away with; boxes on patent diagrams that say ‘neural network with pre-labelled data’ may as well just say ‘computer program with some code’. Meanwhile, engineering literature has to prioritise reproducibility and comparability with related work, so a lot of the evolution of algorithms in that context might well focus on strongly stated but unrealistic assumptions, and show only incremental improvements in algorithms that rule out being able to often definitively say ‘yes, this is the thing that performs best in situation x’.

What we’re left with is a situation where a lot of questions that are more pressing to practitioners may be un-investigated, or hard to find information on, e.g. ‘what happens to this machine listening algorithm on a loud stage?’ (or with polyphonic material, or a reverberant room, or with a compressed signal etc etc). As you know, my own approach is never to assume that a machine listener won’t break down in hilarious ways, and to treat this certainty as musical territory. For consistently predictable results though, you will probably find that a single algorithm / model won’t cut it for all but a very narrow set of circumstances, especially given other constraints like minimal latency etc.

2 Likes

I appreciate your thoughtful response(s) and part of my pressing on these things is for the sake of the process and the project at large. (i.e. I’m over the buffer thing, but it still serves as an example to point towards in a context like this). So me voicing where I find things difficult/missing/frustrating is a gesture towards (hopefully) improving things for others.

Sometimes it does feel like mixed messages though, where I’ll comment on something I’m unsure about, or find lacking, and I’ll either get a response like “you can code it yourself” or “we’re working on it”. Neither is ideal for the “now”, but the former leads to some of the frustration voiced in my post(s).

As to address one of the more specific threads of this thread, the “parameter dump” thing. I both disagree, and disagree with this idea in that a parameter dump approach is still ‘curation’ in that by not making those decisions, you put that on the user(s) to figure out and practically speaking, without knowing what is important, the parameters themselves lose meaning and usefulness.

Or in short, by making everything adjustable, it means nothing will be adjusted.

Now, I’m not for fully black boxing things, but part of the reason to use these tools (at all / in the first place) is for curated effort involved in the conceptualizing/curation side. I now know something about NMF, which has a huge footprint in the toolbox, out of many (presumably) decomposition algorithms. Why this algorithm out of many? Why not include versions of all of them? I would assume(/hope) that it is because it was found to be useful. That was a decision that was made, and as a user, I have benefited from that.

It is obviously a balance, since the fluid.machinelearning~ object that only takes a bang as an input (and creates an unlabelled buffer~ of course(!)) probably won’t be of much use to anyone.

Cheers @rodrigo.constanzo

I think both the mixed messaging and the managing the effect of letting all the parameters hang out relate back to the consequences of me being more tied up with deep coding than here. [of course, this work is meant to be invisible, which of course doesn’t help give the impression that we’ve been pedalling hard to make things better…]

In respect of dealing with blockages, being able to say ‘let’s code it together’ would presumably be an improvement on ‘code it yourself’, both in terms of getting more immediate use out of the stuff, and avoiding reproducing an alienating customer service dynamic.

And, of course, you’re quite right that throwing all the parameters out there creates a signal to noise problem (FWIW, I don’t disagree at all that this constitutes a curatorial stance): if we’re going to do that – and I still think it has pragmatic value in a discovery phase – then, clearly, it behoves me to take some responsibility and actively join folk in trying to distinguish the signal from the noise.

I don’t think any of us believe that we have the last word on the interfaces as they stand (although our tolerance for many parameters does vary). It’s still a slightly open question about what the most principled way would be of hiding (without removing) some of the complexity, and we’re totally open to ideas.

Why NMF? Well, @groma will be able to provide an authoritative answer, but I’ll have a bash. I should add though that we also don’t plan to stop adding things to the Fluid Decomposition Toolbox. Things are paused at the moment because the premières are coming up, and we’re trying to clear the bug list / tidy up for public open source release, and produce the first objects for the companion toolbox. So, NMF was a starting point, rather than an ending one! In particular, it’s an algorithm that Gerard has a lot of experience with and understands well, so was able to produce a working implementation quickly. It’s also very versatile, quite simple, reasonably easy to get useful things out of, doesn’t require unrealistic amounts of training data, and has a lot of previous application to engineering-type audio analysis (albeit less for creative purposes).

There are perhaps fewer mature, general-ish audio decomposition approaches in this vein than you’d expect. There’s a lot of stuff that people have picked up and put down again over the years, and a lot of things that are interesting looking, but very computationally expensive, or poorly understood for audio. My expectation is that some of the things that emerge in the second toolbox will prompt us, and the community, towards what might be immediately useful to add to the first, as it becomes easier to start herding our sounds and analyses around en masse and in bulk.

1 Like

Hello. As my colleagues know, I also tend to prefer interfaces with less parameters, although it is not always easy to choose, and there is the danger of hiding things that can be exploited creatively in unexpected ways. Also this can have relatively easy solutions, as objects can be easily wrapped into abstractions with a simpler interface, and defaults are provided for most things. I guess it partly boils down to familiarity with the algorithm, and the documentation and knowledge exchange website should be able to help with that as well.

As for why NMF, for me the main reason is that it is a very versatile algorithm. Unlike source separation methods using neural networks, it is essentially unsupervised, so you can have it find components that are in your audio without telling it much. It has definitely less assumptions than sinusoidal extraction, transients or HPSS about what is the signal you are feeding it. The only assumption is, there are some spectral patterns that are repeated (which often happens with instruments but also many forms of music - it is quite helpless for purely noisy stuff though).
But it can also be used in a supervised fashion (and about that discussion, you can totally train its bases or activations with lots of data, and then seed these to another instance).
So yes, for an audio decomposition toolbox aimed at music creation it should be quite useful IMHO. This versatility makes the interface more complex, and in my opinion with the companion objects like NMFMatch and NMFFilter we have already started pointing at some useful, more specific applications (apart from the fact that they work in real time, of course).

3 Likes