Segmentation by providing examples

Hello all,

Has anyone come across any good papers or implementations of segmentation that is based on providing manually segmented training data to a Neural Network for example?

I was reading this and it sparked my interest in the matter:

1 Like

That’s quite interesting, although the accuracy would probably be limited by hop size stuff as well, as it presently is for the novelty/onset slicers.

I did get me wondering about a “dumb” version of this, that is somewhere between @tremblap’s “pick how many slices you want” approach, and a full blown NN-based solution. Where you manually slice/segment a bit of audio, then give that to any of the algorithms and have it iterate through parameters until it gets the same results as you, and then let it run on more/new audio and perhaps retrain/refine as it goes. Could take some of the guesswork out of trickier parameters to optimize as well as finding workable solutions for parameter settings that may seem unintuitive.

This sounds exciting, but I would not know how to do it yet… I know that @weefuzzy has manually segmented many complicated files a few years back, ready for training, but on what to train them is my big unknown…

edit I just had an idea which is in line with my interactive heuristics view of life: what if you

  1. over segment like @jamesbradbury does in his thing, until you have all the slices you want, but also a lot of false positive.
  2. listen and decide what parameter makes you reject the false positive
  3. guestimate a sort of descriptor space for those
  4. train on this

then you can run a first pass (over split) and a merge pass.

Does it makes sense?

I think part of the key is getting a slice → feature map right. I thought something like trying to discover a connection between on/off and the neighbourhood of spectral frames but I’m 99% sure it will be a kind of slicing that ignores the bigger picture if that ws the approach. Something like a novelty kernel + sliding window + neural net + training data is in the ballpark perhaps.

1 Like

I was more thinking like I said to over segment and then sieve with other descriptors (length of either segments is a cool ratio as descriptor, or relative loudness, or whatever I explore makes me hate the over segmentation…)

This kind of thing I find super interesting.

I’ve not used the novelty stuff too much as it’s either too big to be useful to me, or too small (and vague with regards to tight onsets) to be useful.

So something that marries the macro-scale that you can get with novelty-based slicing but with a tighter underlying segmentation would be amazing.