@harmfiltersize for small buffers (HPSS)

rodrigo.constanzo · October 28, 2019, 11:06am

Don’t know if this is a bug, or my lack of understanding. But the way @harmfiltersize and @percfiltersize in fluid.bufhpss~ appears to be inconsistent.

The reference file says this of harmfiltersize:

The size, in spectral frames, of the median filter for the harmonic component. Must be an odd number, >= 3.

And this of percfiltersize:

The size, in spectral bins, of the median filter for the percussive component. Must be an odd number, >=3

So I don’t know if there’s a meaningful difference between “frames” and “bins” here, but the scale of time is kind of weird.

So take the example pasted below. I have a buffer of 50ms. If I run the default settings (@harmfiltersize 17 and @percfiltersize 31 (with default FFT settings)) I get nothing in the harmonic component. If I bring the @harmfiltersize down to 11, I do get some results.

Where I get confused is that the “usable range” for @percfiltersize is far wider. And the reference file (or help file, from what I could tell) fails to explain what, if any, relationship there is between these two parameters and/or how they interact.

I imagine that the FFT settings also have an impact here, although I can’t tell (again from the reference or help files) what the default FFT settings are. It shows the default everywhere as 1024 -1 -1. (from some trial and error it looks like the default hop size is half the FFT size, but it should say that somewhere).

SO

Why is the @percfiltersize so much bigger than the @harmfiltersize? And/or do they operate on the same units (of time)?
What is the “formula” I am looking for here, to figure out the filtersize settings for my smallest possible (time) window? (hopsize(512) * bins(17) = 8704 samples here, which comes out to almost 200ms unless I’m mistaken)
Is there a general “best practice” when trying to HPSS-ify small buffers.

(as an aside, the reason why I’m operating on tiny buffers was due to the refactoring I discussed in this thread. I could conceivably clump a bunch of these tiny buffers together, than “bulk” HPSS them, as a way to potentially fake the solution. This approach would either add some more latency, as I wait for enough onsets to arrive before processing, or reprocessing the larger buffer each time a new onset arrives, sort of duplicating work)


----------begin_max5_patcher----------
3292.3oc0cszjqhaEdc2+Jn7xb6wkdfjfr5ljESRpjJKlTUpTSMUWXa4tYtX
vEO5GYpo+sG8.vfMXjwH2b6E1zBIz47cdnijNV7a2e2hUIuwyV37Gc9Ym6t6
2t+t6TEIK3tx++tE6BdacTPlpZKVmraGONewC56kyeKWUNZoylDdlSbRtinN
6Kx4U0YeP95mCie5wT95bce45CWBdvw0S9IF.TeQWBb9kx1DWrKLNhmq5TXY
gaShy2Frl2rLQESJxqpInrzvMJhJY0u9CDzhFMOK7+oZNV1kGJNNXmp3E+Ud
zK77v0AKj272u+d4GObkXSXrSJOHJL+cm7mCybdMoHZiyJty9hnH9FmsoI6b
BhcBhdM38LQcWmjtQ.XN+8+1+1YUw1s7zpmYTXr3tEwpGLpe.F5hjXpm2RxC
NHHSAv3g.3AASWuwfKqJxyShOi5.UQr9X4mH2pOuDZE1AsBNzioB4aNO8Qdb
vpHdSlS+LxeeOWSJKVED+zBmeYDrYL+UQ+dhzO2YkCbPlGhUeQgJAF5x3dzo
bO1awY4vGjpk4SJit2IaXgLjRqzEub9rCoL1sW9LK7o3fnE0cfhlDlRZFqjy
D1+gQ7W3oYgBUzC8wcKB1uuQw20nIR33WSTOHuGpKJLVWDrtnT9KgUsGTWZP
p.YxEvRQp1myazJdP9XR1vSiKBUOIcgBASIIoDAReUY6KcCpjTU29.ViUFQL
eEh6QTHOABNf1BQ7SQIq+FeSShaQxddbX79TdlvOVP9Qz9hM7sAEQ4O1zSJD
sry6W4ntyaV6u8OkFJDQUU4ozvMIwRhnkjPVbU2IbsQTbEoIynpQbv9NZrPI
Q.K8byLASVjsJHUJnJ8LfptYdRRT6aU2tH917xauOLN9HTLOYe+2LM7omOSa
WkHt4ty8rU2I6whX8ceTnSj+XVvKsQ67fnnR6z1O92BhC2EjKFjSKBPf5ap8
N9b15zjnnV7q9Nuzwc1HzwWyeMbS9ypNpoxfn5g6qThVTKk2D9DOKucY4AOk
0tjr720fdihJVUZC+XNe29HAWztBshlooAaSOZsJ+bd1Z6cacT35u8Qya00v
tdZKNkAmK7Xeb85mqeecm3uC0jD52mWUcTt1K8hL8fxe3CG3R2A.El1iOQGx
GZXPAcwfBXNAJYur8CGOHPZLPFBaf5vcPJmZd3gAGbufia2fCx2Dv4gyd0MS
eRfYvAfLWW1RhuuOiBz+gnBSNWcHtDKncQOK9sMJIH+lAPewwULaoQgPLh0r
+fvYDDkFDuQLeJw3ZiBlPpIixrghD779tODO9swPC.FxTCiUSGAV5512FXBd
VYcAVxLDSziwi8rAlP9dzbpBXPL6Y.wlMFPhIf8Mg1hOTD.jyOHTab.NPwLr
TWaFRg.54mQACCUzKEpPmOjnaFNstH8E9GFBH5gu0CkO1nfP8fGPCiBpw5+L
Oh2AJFYRONk7C4eL4xEYu3cXrYkKYHdv.d5FhnZsIWa.QdeO5gtaXBZu.dX9
emEvimmqZLLW6EvCycdYcszTLgYu.dXyv.dvCZNUALP6EvCiN2B3QDuSc3ND
c3NTpg.kud0wHLKDuC086n3cJAjxU8wJw6PIyx3c9vv0Cih0Kjp+waG1TrdN
34zpEFmDlwMDVpVXYpEVXY57ZgkkCWOzHSHWzRFoNXXqrzVn4TjdFMcAwzII
1cBBnY1DDHCOAgRPgBs1TBPeWtnMU.CDasXXPymXX9hi2RS8oP04lDzFHB96
QUkZjwl5Jtyq3cwHJTDoKvwEBbHxfcIJSFmxU5iXzVeBKyrInA9c7uXHCMOB
78KBvBBLCOnMx9uodIhmOK4PokEDXHpfsmcEz61hJsx3udG6gParhBmj1eSP
LsvilYZuoA3foBn80VxcVYjZB0uAbMsSA.dduuyrYJpmDjmdxyLlMrZlUoUh
dkEb7LyaR4doPcsvZK3MGWZg0uuNhanRid0EbsgmVxMWkQ2IlkCVdt5Dtu5y
w37.zSbHMofpLguY14ope7F9aMRs0IA.TD8P7uNBLhorO3h8UXhqyqEgNjvq
xeG.8joiJtWd+tgtrjhz0UzUIk6zFI1vyxCiqS+2e9vnCxJZjj6hIBfgTAFX
Sp.YHUPsIQfMUfPrIU3ZJUbJlkjtQml8nOYRi0Oog+jIMu9IM5mKogb6mzb+
jIMZ+jF4ykzvmwLf84RZryfZvOYRyueRCXGRiXptlUGngNKFzkYJUfsIU3YJ
U3aSpv2T8BjEoBjogAgf1jJLNjPI4BsEUXZvXd1DJvWhZAyVTgwCPZUrfbIX
A0VTgotNQLahEl55DYyfzQl55DYSG3xefQNHSl0jMoBrwtNsIQflCSikbItu
sEQPMEJP1bXcJdNHPnl5+lZSuETS0KTflsFVmY7bCrIVvLFKrY.3LSGKiYUp
vzwxX1L5Bl2kHQbsEUX5z.X1L5BSgBq5svzAQr4LQfvKY15cqTbXQyUGo.8r
p4GNQJJ1Dl7SpyEhG+m73hCKNe0IXQ2z9pm1FFEsNIJIsQEZTi6VTsg.Kz05
gF2ptc+rCXIx2EB8DL9RLDyj+pTEWItfPTtkOocvpF5Rb8AHY0coHlKQckGB
Svxq.czTzg9D.808Dv2C3puRTDTzmG2zf3mJOqLXGNvGzB3zj8Io0GbHKw9s
ZWQdxSoAaBK24i5k34f75vochdG6E07zSBB4N.IDDgMeFcotUKT+GEqCrgfr
lYNir7Bv01RSuZYnGCP77zW44h6so0RSJQzYx5Sk+wzWoeZ2Dg4CMgyV3gOv
Ex7UrhOwkoHMOWjOSwnDZKa8f0qEO7VsGiD53JPzshGkVFxxNhyZKGbYDHV0
GHernqUWQf.ZGMjGw2cb+x7D3nVX.fdd5qDj.BcZ6yDOf0RXq8S.I5LEACoL
j1hzC.ftm9.japspsOFFKOCh30ZSXeDjpo8JScT2pDsNWyDdI+W64wN+TPbl
yOw2EtJIZSqsg1BFceKKJbCO8+HOsiL03qAhoxQrVeXY5cWvazFGmRC6mneh
UlmfCnHqMmqkmrRuCFqJn5kpOtMfyQaztwRRg0mXLHjeSiUodqKAdqH8c7rr
fm321w.Z3IuxK7oWMfmblmb76G57pta5rYDc6oEepk5kqQbbZybAZyJ5sQrV
BzE6eKzj0o+zeNpf2LeSNOk20XnPF0UGTPMS3KKBdCYhzjWiuJt3.sSp8ofu
wbwe48fqiI.HQTP5A.XXLR6Y.AgP7siI9wTN+J4BkT.K+SGeiLz0aGC7e4QQ
IudUbfmHVPl12JzCp7GQDRFR2gU03vHDIqH.3YaFsX2Jd5OJhZ6pXSo4APot
QDAdPTbLr4XBVkIR0ADJ4hQEjkAzeeg+ScoTreo0EqbZGB4q+MwN6.eaWoWe
buOfAApp6ico.+xIcwfjIi6KeFUOfExCjxMOpO7MeLHOOMbUQtdwWZvnW1gD
4SQIqBhJOBHqA2NNCIu+vRJLQGgyvkNqS4A4bm.m7v32cDSpYvy51xefK9LK
d7ViucGu0uJDoaSR28wYYbcBmyzeo+EdP68vblTVn9PttlL2ySWWjkE9R8YH
dabf1AN.58b.V+6o6gydQTXl56wcfHeIHC.cMHyyAo6RhCWaLtTe5ONewEk0
g9TCarvR4BGaLn38IBJCdZnWd33CQnI87Pu9j16S97PeaTQ3lkBA3y6yx9v4
qZYmS4WesRC2o9hudvcfyodFNC.pOpwcQjwbViiOEA8G7HUuVEYRQLsx9GWD
yiz+r5DC6OFluiCTdCLYlbUkJF+Xudmgs0tW88mHtl8Iv052+Ce3z1oVutNK
8Y5M.KiM98lvvGn9SuDtz5m.Flc80uGA.nIRDCsoHVF4aQ3YXJl9zPR+h5Pd
RVLMuVDP8xTGdeHHos5fJ1FFkKeaGHhezhrIDp+oNOY7I9R3So+yt4SUca+i
pRy1Gug0kL+oI9UqrZomMotQRbbB9ZbG4snwl.a6NZHNBOE8jKvjdxaB5o1+
zp5om7mfNBci5mwwOZs8id+lH6hidulbz6zjSeelz+6xjieOlnVWfiR.CMqN
PhWbzhjTBDcr+LUqnP2IYwXRvhQlbEiLwJ5dKX5e6W5YqWpV0iGtu+0vo852
zTuqiDlX7Bfi2Xrws8TiLoHFWBQLYBgpooNtjeX7I9vnR5gqIgGtpjc3pSzA
iSxgq1f3zjYnaCCCRfgqlVZs8m8YeZTxILtDS3R2N2Iggaj7ACh7lkvASBY0
JwBlZekWbBDLxjGXjdlmXmkSjVUaqCyjhMSF.CztFJA.tZMqS2n+topwt49S
EA1bS7MlBMZi6mHJr4FzaLAZzlxOQDXqMe2bJbfMbehHt1artwTmQal942H8
qmANYCyMl7MZSxuZB7jMC2fAxMdKfGwlee0Lzoax8zg3cwUFso1WFWomXbOu
kMu+2u++C5NTxSC
-----------end_max5_patcher-----------

tremblap · October 28, 2019, 12:03pm

Hopefully, @weefuzzy’s new HPSS explanation will help you understand. Bins are in spectrum, and frames are in time. In that model, a ‘line’ is consecutive elements that are positive. For percussive, it is a line in the spectrum: consecutive frequency bins, in a single frame, will make a ‘vertical’ line in the spectrogram. For harmonic, a steady note will make a ‘line’ in time: the same bin +/- will be ‘on’ for some time, making a horizontal line in the spectrogram.

Does this help? This should explain why there is a discrepancy of width of lines.

rodrigo.constanzo · October 28, 2019, 2:29pm

As in, one that @weefuzzy is going to put out, or is it something somewhere I can see?

Ok, that kind of makes sense. That could be made clearer in the reference/helpfiles as it’s definitely useful information to have.

In doing some testing with the realtime version I found that I could go as low as @harmfiltersize 5 and @percfiltersize 9 and still have useful results. I should test to see how small a buffer that actually corresponds to, to make sure there isn’t some edge case which ‘fails’ in the patch.

tremblap · October 28, 2019, 2:46pm

I’m having loads of fun to tune those by ear, depending on what I want on which channel

weefuzzy · October 28, 2019, 2:53pm

Indeed, something in progress, with hopefully enough visual explanation to make it clearer how the filters interact, and quite what is happening.

The length of time that makes sense will mostly be determined by the harmonic filter size, and this translates into a number of STFT hops. So with @harmonicfiltersize 5 and a hop size of 512 @ 44.1 kkHz => ~58ms as a total minimum (i.e. 5 hops). However, the behaviour at boundaries will have more effect the smaller your sample. Intuitively, I suspect working with slices that small might make the harmonic parts a bit jumpy from analysis to analysis.

As for how the two filters interact. For each cell of the spectrogram (i.e. a given bin in a given STFT frame) the two median filters give respective estimates of how harmonic and percussive the bin is. In mode 0 these estimates are essentially normalised with respect to each other so that the total energy of the bin is assigned in proportion to how ‘lit up’ each filter was in that bin, which is why the process always null-sums in that mode.

However, that does mean that if, for whatever reason, the haromnic (resp. percussive) estimate for a bin is 0 but the percussive (resp. harmonic) one is > 0, then it means all the energy gets alloted to the non-zero thing. When you have a long harmonic filter size in relation to the number of actual frames you’re analysing, then it’s quite possible you could end up with a lot of 0s at the edges, because (IIRC) we assume that the data ‘before’ and ‘after’ the buffer is all 0s, which will drag down the estimates in the early frames to 0. It’s possible we should revisit that assumption (it’s a harder call than what to do at the edges in frequency where we ‘know’ that the data mirrors, because of how the Fourier transform works).

Finally, FFT settings documentation: for some reason, I was convinced this was all now spelled out in the reference, but my computer disagrees, so I must have dreamt it; sorry. It’s easy enough to add some boilerplate explaining the defaults, so I shall do that (but yes: default hop is window size / 2, and default FFT size = window size).

rodrigo.constanzo · October 28, 2019, 3:55pm

So is it zero padded, or will it take from past the edges?

Just thinking now if I copy the appropriate audio to the concatenated buffer first, and then fluid.bufhpss~ the small segment from there, if that would minimize some of these problems.

After some more testing, having a bigger @harmfiltersize sounds better for what I’m doing, and cranking down the hopsize only works so much.

Otherwise, I’m kind of back to square one in terms of just brute-HPSS-ing the entire concatenated target buffer to alleviate these sound problems. The “whole” buffer is only 1s long though, so not the end of the world, but I was hoping to avoid that approach by chopping/slicing/analyzing only what I needed. I just didn’t account for Mr Jo Fourier hating little windows of time so much…

weefuzzy · October 28, 2019, 5:21pm

I think the 0s assumption is when the actual edge of the buffer itself is reached – if you’re analysing a segment within a buffer, it would seem sensible if it made use of the actual information available about what happens before and after. I’ll see if I can confirm that this is the case.

Blame Heisenberg, not Fourier ;-D

rodrigo.constanzo · October 28, 2019, 6:06pm

Interesting. I was under the impression that @tremblap was all about the zero-padding or whatever, outside the analysis window (coming from all those edge case discussions about descriptor analysis windows).

tremblap · October 28, 2019, 6:19pm

I don’t think I agree with this. We will talk about it further, but the rationale is quite simple: if you want wider info, widen the query boundaries. The argument with zero-padding was that a slice is a self-standing entity surrounded by digital silence.

rodrigo.constanzo · October 28, 2019, 6:40pm

You could also fluid.bufcompose~ what you want into a ‘digital silence’ context if you really wanted

Both kind of make sense, and I guess it’s tricky when you’re dealing with small fragments.

Also makes me wonder about future (TB2) stuff. How these kinds of processes would work in a granular/mosaic-y context if the “grain” has to be >50ms to play nice with analysis tools. (I guess the workflow/paradigm could be to bulk pre-analyze buffers into constituent components and then deal with fragments of it, but that won’t always be possible)

rodrigo.constanzo · October 28, 2019, 6:42pm

I guess this gets a bit too quirky and opaque, but in the odd edge case where the selected window size becomes useless given certain window settings, it can look at the material outside the boundaries to give a more “accurate” answer.

Or maybe just having a @greedy flag, or even better, a @boundary flag, that lets you decide what you want to happen. You know, so it’s not a “black box”

(and/or perhaps if @...filtersize parameters < hop/fftsize, it can throw up a yellow warning or something)

tremblap · October 28, 2019, 7:16pm

indeed

I presume that we would need to add, to @groma’s disapproval, another attribute for edge cases. It is on the radar for now, but not super urgent.

You still think it is the tool’s problem, but I will remind you once again: you cannot know the pitch of something smaller than the wavelength. So it is a design problem from the user’s perspective: what do you want to know, and for how long after, is yours to decide as it is unpredictable. MIDI basses have hit that wall long ago: you need periodicity to call for a frequency, and low notes are long…

rodrigo.constanzo · October 28, 2019, 7:27pm

Yeah, I guess in my case here, having a @greedy 1 flag (or whatever) would solve the problem, since the samples, in my case, would exist in a larger context, and information could be drawn from there.

So does that then follow that some decompositions then enter an unanalyzable territory? (e.g. the transient output from any of the .transients~ objects) I guess it can be tiny FFT sizes and/or just limited to loudness and some spectral moments.

Was mainly just wondering, out loud, if these kinds of interface-y issues will all come down to the user (“you can code it yourself”), or if there is a game plan for dealing with this kind of stuff (again, not in an impossible physics/math way, but in an interface way).

tremblap · October 28, 2019, 7:41pm

I’m sure @weefuzzy has a few ideas up his sleeve, but again, so many conflicting use-cases make a blackbox approach inappropriate…

rodrigo.constanzo · October 28, 2019, 8:06pm

Not advocating for a black box, rather, wondering/hoping that the tools/interface will consider these kinds of use cases and have stuff/objects/whatever to deal with them (in an open way).

weefuzzy · October 28, 2019, 9:04pm

That whole discussion was about descriptors; I’m not sure we examined it in the same way for decompositions. It seems pretty clear cut to me.

tremblap · October 29, 2019, 8:22am

I know you don’t and I really appreciate your questions and challenge. My aim with this project from the start is to find empowerment of creative coders through a balance of knowledge and tools. Very hard to strike. I even disagree with the thresholds depending in which disposition I am in.

So keep on sending ideas, challenges, requests, and we’ll keep refining what the threshold is for this project, whilst also pointing at other options out there for more close or more opened implementations. The REAL agenda of this project is to create a community where such discussions can happen.

weefuzzy · October 29, 2019, 9:41am

Having slept on it, I’ve decided I agree with @tremblap, and you should get what you ask for with temporal analyses. That said, we can discuss the feasability of offering a boundary attribute in the same way as the python median filter.

Meanwhile though, it’s worth proceeding on the basis that the harmonic filter needs size/2 + 1 frames to settle, and padding your analysis for these really short cases.

tremblap · October 29, 2019, 10:29am

This means that you are consistent since I take all my wisdom from you and @groma

tremblap · October 30, 2019, 8:29am

Speaking of interface design and how complicated it is to be right, this is funny: https://uxdesign.cc/the-worst-volume-control-ui-in-the-world-60713dc86950