Automatic Separation in soundscapes


This morning I saw this - made me think of my example of nmfmatch, but better results, and much, much better interface, obviously!

Kind of fun!

Thanks for posting this.

I am definitely at the stage where the big orange button ‘Find All Events of the Chosen Type Chosen’ is of most interest to me. Using the tools as they are now has been very useful for cleaving samples, pulling them apart in different ways and getting to other types of material that are embedded in the source. What I would love is some strategies for making connections between what I found in this stage to other parts of a database of sounds. I am under the impression Toolbox 2 tackles some of these wants expressed by others too.

Just my 2c too, web interfaces absolutely smash anything that can be made in Max.

Definitely more effective and slicker.

I guess it’s tailored towards a known source ("…extraction and evaluation of ecologically-meaningful soundscape components…") so perhaps that helps with efficiency/effectiveness?

A couple of other thoughts on this:

  • Being able to easily seed nmf queries (in this case, a single button). I remember that @weefuzzy mentioned something about making an abstraction which handles the creation of seeds in a more easy/automated way. That would make the exploration of seeding much more playful and…erm…fluid(?) since it probably takes a dozen objects and a couple minutes of coding to whip up a simple seed->nmf test.

  • The interface is quite slick, although I’m not a fan of open/web interface stuff since there is less standard/known behavior to mess with. I think it’d be possible to create slicker and more usable interfaces in Max (and SC/pd), which will be more useful for those languages going forward. (that being said, having more stuff like this in the knowledge exchange (is the /ke page dead for now?) would be welcome.

  • Are there going to (eventually) be provisions for dealing with “known” material types? As in, a lot of the discussion around the Sensory Percussion trigger stuff involved specific things that would have been done to the algorithms knowing that they would always be dealing with quickly decaying and short drum sounds. In this example here (not having read the paper), it is possible that there will be similar assumptions where there will be “background noise” and “different kinds of birds”. Obviously there are times where you want to explore unknown and potentially completely open and varied samples via nmf and what not, but other times there will be known inputs, that if it is possible to tune/tweak/optimize the algorithms for, would be useful, particularly if the tradeoff is a jack-of-all-trades-master-of-none algorithm(s).

Original paper on which this is based is here

They’re using sub-sampled constant-q spectrograms (so log freq, unlike the linear freq of a plain FFT), and a 2D embellishment of NMF. Using constant q with this means that the fancy NMF they use can latch on to harmonic templates at different frequencies. Ergo, it’s good at separating repeated harmonic spectral templates that move around in frequency.

Such a thing will appear in the examples folder in due course.

I really don’t see much difference. The first port of call for prototying an interface like this in Max would be JS in any case. I haven’t looked closely enough at this example to gauge how much of the heavy processing is being done in the browser (if any), but that’s the big bottle neck for doing things like this in the learning resources (for arbitrary material).

Of course, there’s some prior art out there that we might be able to take advantage of, e.g.

Yes, it’s going to move to its own sub-domain when I get back from vacation. I’m going on a KE sprint (so all suggestions are welcome).

I’m not sure what you envisage here. Part of the project, of course, is discovering some of these things collectively. We don’t know, a priori, either what optimal approaches for given material types with given algos might be, until people try some stuff, and we don’t know what kinds of things people might want to do. Some of this accumulating knowledge will find its way into the docs, for example with the sort of object-specific discussion of FFT settings that @tutschku has asked for. Likewise, some of this domain specific stuff could go into the KE.

What’s less likely is that we start making black boxes that claim to be tailored to something specific, because such claims tend to obscure their underlying assumptions and interfere with open-ended experimentation. Kind of like presets on compressor plug-ins that say ‘guitar’, but don’t make it clear to users what assumptions about guitar sounds have led to specific choices about settings: I’m not positive that those sorts of things help people get better at using their gear.

Now, in the Sensory Percussion stuff you discuss in the other thread, a good portion of the magic seems to hinge on the fact that they have a pre-trained neural network model, and some way of mapping between user-trained data and this pre-trained stuff in order to do some classification / clustering. This is all very much toolbox 2 territory, and one open question for this stuff is how useful large, pre-trained models really are to musicians (who, we are thinking, are more often interested in particularity than generality). Larger models require several things that might be cumbersome for musicians: lots of processing power, time and (well-designed and labelled) training data. What be of greater general use is being able to document and share the knowledge that presumably went into some of the larger decisions that underpin the model that SP used, e.g. what features are useful to feed to neural networks for particular types of thing.

The stuff that @groma presented at NIME this year shows one possible direction that will probably feature in toolbox 2 in some form

1 Like

Awesome, with bated breath.

Same goes for the KE stuff. Might be worthwhile making a KE thread and posting the structure/roadmap there for people go offer thoughts and gaps in knowledge.

I don’t know nearly enough about the underlying algorithms and such, but what I mean is more along the lines of what you said in the second chunk. Where certain tweaks to the algorithm end up being “better” for certain kinds of material.

In my specific case, although the sounds I may be working with will vary a great deal, if I’m on drums/kit/percussion, I can probably make certain assumptions about transients and amplitude envelopes in the material. This obviously won’t always be the case, but I can go into it knowing that, and if it makes the difference between an algorithm being usable or not (fast nmf-ing, like in the faux-Sensory Percussion tests/patches which ended up being too slow to be useful), then for me, it’s worth exploring.

All of that isn’t to say that this isn’t an argument towards/away from a Black Box (:black_large_square:) paradigm, and surprisingly I’m arguing towards an more open approach where options like “pre-training a neural network model” aren’t available in FluCoMa (now (or ever?)). As I said, I don’t really know or understand enough to know what these kinds of things are, but it is something that seems to come up when encountering “real-world” examples of ML stuff. The implementations are rarely general.

This thread is interesting. I very much like @weefuzzy’s vision for the project :wink:

This is a very good idea indeed. @weefuzzy and @groma are in holidays now, but as we planned to start the beta today, I might put a place holder thread now.

1 Like