Getting the masks for NMF

I sat down this morning to write a patch that would extract the mask for each component as a 2D data structure (a jitter matrix for now).

When I got to a certain point (fairly early on) I realised that I was going to get the basis and the activation, which I believe (unless I have misunderstood) does not give me the mask without knowing the original FFT data (at which point I’d expect to have to do:

Mask[t][k] = (Basis[k] * Activation[t]) / Spectrum[t][k]

Can someone (probably @groma but maybe @tremblap or @weefuzzy) please:

A - confirm my maths, at least.
B - tell me if I’m being stupid and I can get to the mask without needing changes to the objects
C - make any suggestions (including - recompile source as you want it to work) in order to get to the masks.

I am aware that I can jitter FFT all the data inline with the FFT analysis from the object, but before I consider whether I want to go down that rabbit hole (and checking how that is normalised/doing the same windowing blah blah) it’d be good to know if I’ve missed something.

Thanks

OK - brain now engaged:

Mask[t][k] = (Basis[k] * Activation[t]) / Sum(Bases[c][k] * Activations[c][k])

where c is the component (t and k more obviously are time and bins) and the summing is for all c.

Is that right?

Also - am I correct that these masks are totally page for other objects like the HPSS? I’m interested in both being able to analyses them, and potentially process and then apply them, but right now it feels like that isn’t very possible with the tools as they are - have I missed something?

Hello, just to be clear, the operation in the numerator is a dot product, which produces a matrix of the size of the spectrum (ether using all the bases/acts or just one), while the division is element-wise.
As discussed before, NMFMorph allows you to do the whole operation from the NMF bases and activations (i.e. using the extremes of the interpolation) although it does not give you access to the mask. In the code there is a class RatioMask that is used in NMF, in HPSS it is done in place.
It would be interesting to see how your use case could fit within the current framework, I guess what you are saying would involve processing masks (which are the size of whole spectrograms) in a multichannel buffer in max?
One thing to consider is that it may be also interesting to do similar things using the bases and activations, which may have an easier interpretation, and then leaving the product for the end.

I don’t follow the first part, as I’m putting forward the method of recovering the masks from the bases/activations - I am not actually looking at the source code here, so when you talk about numerators/divisions I am assuming you are referring to the NMF algorithm?

It’s possible I can do what I what on the activations and the bases indeed - that would assume that time and frequency will be treated independently and I haven’t quite got that far yet. My first task was to calculate and display the mask - I’ll put that up in a more open part of the forum and link in just a moment. Right now the calculation is slow, but I’m iterating in max and pushing to the matrix one cell at a time, and I suspect using javascript or similar might give me simpler code and faster speeds.

@groma - see here:

I was referring to the formula in your post, but I guess it is implicit in the indexing if you only use one base, if you use more then they are summed in the the dot product, as in the sum over c of the second post.

Yep - sorry - the Mask should really be labelled Mask[c][t][k] and it’s the value for a single component / hop / bin.

nice, if you modify the bases / activations, this would allow you to see the result as it would happen internally in NMFMorph with no interpolation. I guess we should add a Buf / NRT version of it.

1 Like