Fluid.bufnnnsvd~ experimentation

Here is a small patch for experimenting with fluid.bufnnnsvd~. I’ve been using it today to compare it to my efforts clustering over division of components through nmf.


----------begin_max5_patcher----------
1200.3ocyY00ihaCE8Y3WgUTqTqDC014CH8kR+Azm1U8kpUqLINLdahCJ1vv
nU67au9KfLCAH.dFpFMD7EGt2iuGeuGG99vAAyq2PEAfeG7OfAC99vACLlzF
F3FOHnhrIqjHLSKHqtphxkAireljtQZr+X8SfJB+Y.SBjOx3KDfr50zFPB5m
A0E.QcE0XG7K4rhBZCkmQE+1ZRCiHY07+3WALt5NopYthmu86ujwoYJCFmD4
LxxMtrd92dHJY6L4qpX7RpzDln8FqWI2ZE5rtjHyzgxWanYRK3CilNNdD.OA
NFNBfhMWRTuB9h9d9wvg5WFciKRBZoxipElpk07VSnMfh7CfPPCDhwIs.D9Z
ADm9jJ3N.ORvb.qKPfu.PfONHhhMXXp4BN0jgvZH3tE62h74kT67ClS3KBFA
BXZByUAzUUyoMcAITvt3rgn3xzluR4j4kz1IhaCt3zDCDs30kxNId0X0f4qB
rEk0pXqCvFld0fsntohXvSxECeHVi3nvOH3eDRMqy8kgS5jRi6DjnSruTQi0
3JB1WN8UykqnBAYA8..tq7i.7SnQ.6dlCAbheArkTivQ8Hs5yzYNI6E.Bf6B
h3K.hvyUlZhi2FuGf9r.T3GPAHW8V78cCnoe+K.nJqo9qikBjeXlVzlXuLcx
4QqfsfSJ8JVWPkDorAHHUKEfYkLgjx6Dz3oWPO0vSreDE1poJJbR+xxl+8Ix
muRqB7EPCU7LW9XWHN0OpHPSvsSzomGvpFipdXuOMWxoaNInS7J4NEdGI2Ek
qX4iUIZdUwKfYJg8MYF885KyHYR1Zi1egy1XkIvr4DAcmE0.vrhBoPEqlCUD
ASS.wHr8MybKjaWPU2cYc1+pOlADLiopQ5bfRNr6atpNmppqncu8scdTiIcj
YfdZCnsgXnsJKJwdA1Kcs6HldVvG5tH3ywQgeLcaNtbH0wTUevoDCgl3yckH
n6xzOXsPLdQ8Ks1Z0ERCu.Vd54PJFYOOML7BJ5paB9JxdedyMcpuiuu.eWNH
DJ1br.D7tpCaaG5SxWR7zI8saIlz+MFue8neCtUMJ5.2wdE2uptv+OvM3yMz
p4kjme3OIe5gOo6t94mp+6ZVFU7veM9Ix5tNjhWWUPtm6yccUYuHFdtXc9g5
XZq3XWuD33z3SJbQs7z5b3XkAUrr2PR+0G0o9kvNzu3G4KNppspTns4cT5MH
dwbiln+MOHZiy01ecRyhb22sCXJAcaceNUnVvMqaslitPN3XmH+RbD7bNJzC
NRK23rdxGHJpOqc5ijzZRUr7k0pVstrkU+ZRnsQeTTGilFYezOXcWMyHeD1m
MS7p47lnNNxDKPiT7VCtw.KD0i.KxKbQXOxb5mi6M6I8CB77jQnO7Ten8QmL
uhBs+NNIFJWHz7qeXFsu9TSNswHe68OXeyZ2NWitsxC5ZLSOmqm3CV1z9Tev
GrLbeXYg32iky96ZjuIQ8obw1v61Sj8oh4Qqyus7ssjY6Q2Zytz9DZGxls5F
HKWpjbIby13DkrtuUaRNSGYFx31gFsNAMz0rsy2nyIfznD4HUJbV0XBrfMI1
eE1.8SHpguh4HYJ3oboQxHWcZPwRhEIFkkC+wv+yHUw1t
-----------end_max5_patcher-----------
2 Likes

Is this not doing double the work here?

From what I understood of @tremblap’s explanation, fluid.bufnndsvd~ (what a mouthful!) will compute how many components are needed and then do the work of running the nmf.

(oh, I guess you’re updating the filters after the fact… though does that “work” if you use the same audio as the nmf itself?)

Yeah I wasn’t sure what the intended interface was and worked from memory of what was discussed last week. From what I can see in the attrui too there is no resynth attribute so I think you have to either use NMF with seeds/fixed or do some clever vocoder type thing which I believe is used somewhere in the help patches.

1 Like

Since this will eventually become public, I’m sure @weefuzzy and @groma will help me make a help patch for this with simple explanations (as well as the 2 nmf based cross synth). Stay tuned!

It took me a bit to figure out which object it was that did this, but I found this again in the end. From the look of the helpfile, I wasn’t even sure this was the correct object as I was looking for “that object that told you how many components were needed to achieve a certain amount of coverage”.

In looking at @jamesbradbury original patch, and my understanding at the time, is this the intended usage? You use fluid.bufnndsvd~ to figure out how many components you need and for figuring out the initial activations and bases. But if you are after a resynthesis you need to run a separate fluid.bufnmf~ anyways.

From what I can tell, you can’t just use fluid.bufnndsvd~ only figure out how many components are needed, as it seems unhappy if you don’t have @bases and @activations declared.

It’s unclear from the fluid.bufnmf~ helpfile is the @bases and @activations need to be recomputed or if you have to then update them (as in @jamesbradbury patch).

So this is a kind of bump, a kind of question about intended use for this, and to see if it’s possible to find out how many components are needed for a given buffer and to then figure out the resynthesis based on that (without double-computing stuff, or updating filters).

Won’t make a separate feature request as it’s kind of relevant to this post, but is there a technical reason why fluid.bufnmf~ couldn’t take a @coverage + min/max attribute instead of a @components one?

Also, it’s not very clear from the reference what coverage means.:

Fraction of information preserved in the decomposition.

Isn’t everything preserved by the decomposition? Like if I ask for 1000 components, I’ll get something in all of them (in varying amounts) and it will all sum back up to the exact thing, so nothing is lost.

Does this mean how many components are required so that each one is unique/distinct in some way?

quickly from the airport:

the interface of nmf is one that allows choice of components (removing a useless computation when one knows what one wants)

the other question has problematic assumptions, on signal and on spectrograms. I’ll try to make statements here that are true but @weefuzzy and @groma will say it better and truer:

  • a factorisation is never perfect
  • you will get something in every component. it does not mean they are musically relevant (too quiet for instance, or artefacts of the spectrogram creation)
  • what a human, and the kind musicians even more, think is unique and distinct, is not the same thing as pure mathematic process would think. Hence nndsvd having 4 methods to provide you various guestimates of how many components would be the best to factorise a given spectrogram of magnitudes (itself an imperfect model of the signal)

My approaches:

  • maximalist with nndsvd, and minimalist with an arbitrary small number of components. The former gives me a number/ballpark for complex hybrids or spectral manipulations. It also provides seeds for the nmf process (potential bases for instance, not just their quantity but their quality). The latter I might curate a seed (like my pick examples). I get a lot of fun sounds that way.
1 Like

So it’s just an interface thing, not some kind of intrinsic computation thing.

I guess in a more sound design-y context, you may hear a sound and say that you want “three things” out of it. That often doesn’t work out with the sounds I feed it anyways, but I’d rather poke and see what falls out, hence something like @coverage (if it works that way) is a good option for that.

Totally. This is me trying to figure out what “coverage” and the definition from the reference file means.

So does “preserved” here mean that it tries to tell you how many components would be required to have something (computationally) “good” in each component?

I guess this may just be a computational point of reference, like “coverage” referring to a factorisation(/maths) thing, while also overlapping with a more common usage of the word “coverage” (like how many components it would take to “cover” that sonic space).

Although nothing is truly generalizable, I’m trying to figure out a way where I can record whatever (sources/samples/synthesis) into a buffer, and then decompose it into a sensible(/useful/good/whatever) amount of components. I can pick an arbitrary number and go with that, but I figured this could be a good way to have things broken down into an amount that is based on the content of the buffer, as opposed to my arbitrary choice, since I don’t know what will end up in that buffer.

On a more practical note, if I want to only use nndsvd to figure out how many components I want, I would be double-computing things by then feeding that number into fluid.bufnmf~, or perhaps over-computing, if I’m figuring out bases/activations and then refining them with a subsequent pass of fluid.bufnmf~.

there are so many wrong elements in that sentence!

  • I said the opposite: it is a computation thing so the interface is streamlined to give the lowest hanging fruit
  • ‘just’ interface…

anyways: you have 2 objects to do exactly what you want. make it sound good!

Hehe, I mean that the reason fluid.bufnmf~ doesn’t have @coverage is an “interface thing”. There’s not a mathematical reason why such a thing is impossible, as can sometimes be the answer to a question I’m asking.

Obviously interface is a complicated thing.

I’m still not sure how the object is supposed to work though. The helpfile and reference are sparse and in the case of the helpfile, quite confusing.

Does chaining them together like in @jamesbradbury’s patch compute stuff twice?

The objects you point at do 2 different things. One makes estimate of parameters, the other does the factorisation. The former has the luxury of providing you with 4 different algo to do that task. hours of fun explorations.

As a working method, if I may, I recommend you refrain for optimising early. In this case for instance: test the interface the way you want to make it work, you have everything to do it, and make it sound good. Then, people who actually designed carefully the code under the hood might spot a redundancy and propose an optimisation, like @weefuzzy does so often (to my code too!)

As far as i understand the 2 objects and what you want to do with them, there isn’t any gain whatsoever to do that in one object except for what you think you want to do, which I’m not even certain will sound how you want. Your question is still very, very general and assumptions on ‘decomposition’ and ‘useful’ very idiosyncratic… So just try it. You’ll see, all these approximations fail differently. Each have a sound.

As for general answers to decomposition and clustering: if Google, with warehouse-size computers and boring source and small number of models, can barely make proper demixing, don’t expect it to work with what you have in your hands. Flip the idea: take the circuit now available to you, bend it, not the other way round. I think we’re 15 years at least from good general decomposition, and I’m the ever optimist… check the state of the art, and if it sounds bad, with their means, it’s because it is not possible with yours… but many, many, many other things are possible with these new things. I for one very much like JIT 3 component nmf crossover, and it ticks all the boxes of the words you used (musical, useful, inspiring, etc). I also love simple things like using pitch confidence to route stuff… there is a breadth of sonic experiences available with nmf, with or without deciding on the number of components. You can do both. Enjoy!

I think we’re slightly talking past each other.

Part of what I’m asking is the literal nuts and bolts of what nnnsvd is doing, since the help and reference are super minimal and ambiguous in places. So I’m not really sure how it’s supposed to work and how you use it in conjunction with fluid.bufnmf~. Like in a literal sense. As it looks to me, chaining them together is computing the activations/bases twice, or refining them.

My questions and qualifiers/adjectives around @coverage (good/unique) are only to try to understand what “preserved” means in the reference file. I have no musical opinions here. I’m just trying to ask what is being preserved and what does preservation mean in this context.

The second bit is more about trying to leverage what I think nnnsvd is doing (as you initially explained it a few months ago), since it looks like a useful way to have it tell me, algorithmically, how many components I should break any given buffer in to.

With fluid.bufnmf~ I can specify this already, and I can do this by whatever means I want, but in terms of my workflows and interest, I’ve not really wanted to oversplit and combine etc… because I don’t generally find that kind of stuff too interesting (again, this is a personal taste thing). So other than arbitrarily splitting things into two, or 500, I’ve not done much beyond that.

What I’m trying to do at the moment is be able to take an arbitrary buffer/sample and break it into n amount of components. If I can use nnnsvd to give me a rough idea of what n should be, great!

ok 2 ways: you put a sound and ask for a computer estimation (by either of 4 algo) how many components you need. you run an nmf with this number of component. That is a ballpark and helps.

second ways is the same but more advanced: the object also gives you bases to start training nmf with, so you take those to start the process. I think it sounds better when you allow them to update from this, but both updating and non updating modes of nmf will give you something different than the random seed of nmf. All three are fun, so I usually use them all to make multichannel sounds :slight_smile:

there is an implied strong personal biais in this question. It’s like asking for good food. There is no single answer for that since hwo you divide in what components for what use is what is not clear, and very personal…

If you speak of auditory streams for instance, the science is not there yet. @weefuzzy and @groma will point you at the cutting edge, but nothing has convinced me, even in very, very simplified focused problems such as vocal demixing. There are some solutions, they require training, have artefacts, and are far from realtime.

This is what I mean. What does “need” mean, vs what I’m saying in the second bit.

Obviously there are tastes and all sorts of things when it comes to sounds, but nnnsvd is doing something here with @coverage. I want to just do that.

Unless I’m missing something, it’s not possible to ask for only an estimation of coverage, you can only ask for activations/bases to be computed directly, which it does quite quickly.

I’ve not gone as far as comparing the difference between the preseeded nnnsvd activations/bases compare to using random seeds. I would imagine it would probably sound better in nearly all cases, but computation time is something I’m thinking about. I guess there wouldn’t really be any difference here if all that is being changed is the seeds that are fed, is that correct? So running a vanilla 5 component nmf vs running a nnnsvd->nmf 5 component thing would take the same amount of time (roughly)?

It does something closely related to PCA to estimate how many components would account for how much of the variance in the source spectrogram. It will tend to suggest lots of components if you’re aiming for high coverage, because the Singluar Value Decomposition (SVD) will concentrate as much variance as possible in each component, like PCA, meaning you may end up with lots of itty bitty ones accounting for the last few % of variance (depends on the statistics of the signal).

The process of establishing the coverage involves working out the components anyway (doing the SVD), so there wouldn’t be any computational advantage in pretending otherwise.

Once you’ve got your seeds, it makes no difference to the NMF at all, except that you’d expect it to converge more quickly (to something) with non-arbitary initial conditions, and so you can almost certainly turn the number of iterations down. On the horizon, I think, is to bring NMF more into line with the MLP and let people have an idea of the loss over time (making it easier to have an informed idea about how many iterations you need for a given desire), but perhaps also to add an early stopping threshold.

2 Likes

:+1:

1 Like

:rofl:

1 Like