Question: voice source separation

Hi guys!
What would be the best strategy to separate a soprano from an orchestra (mixed together in a sample)?

As a utopic goal, I would like to have one buffer containing the whole voice and one buffer the whole orchestra. I know that reality is harder than this :slight_smile: And I know that neural networks are better suited for this task than nmf, but fluid.bufnmf~ is so much handier :wink:
Iā€™ve tried it with different parametrization (and ranks), I havenā€™t been able to isolate notes reasonably well (the vibrato does not help, also the sample is timestretched but that does not seem make a difference), although I think I can manage to simulate more or less what I want (by splicing together different portions of processed buffer). I was wondering if there was a better strategy.

Hereā€™s a portion of source sample

Iā€™m starting from the ā€œbasic exampleā€ tab in the help file, modifying rank (from 2 to 10), iterations (increased to 150), and analysis parameters (tried to increas them, unsuccessfully).

If anyone has a hint to make it better, that would be cool!
In the meantime, Iā€™ll still keep trying to tweak parameters :slight_smile:

Best,
Daniele

Hi @danieleghisi, good to hear from you!

This may well be beyond the talents of NMF, especially if you donā€™t have some isolated samples to make templates from. Basic NMF like this canā€™t group different pitches from the same source, so youā€™d want as many components (set with rank) as each source has discrete pitches (and, yes, vibrato complicates this).

Unfortunately, I canā€™t play your sample for some reason, so itā€™s hard to offer any concrete suggestion for this particular problem. When @groma has finished travelling, he might have some ideas too.

Hi @weefuzzy, thanks for your tips! I imagined this could be the case :slight_smile:
As for the soundfile, itā€™s weird, indeed: I just copied a dropbox link. If you right-click and copy the audio address, you may paste it somewhere else. In any case thereā€™s no need for you to hear it, what you say is already very clear and reasonable :wink:

thanks again,
d

I know you are not allergic to code and I have been playing with this recently - perhaps its of interest to you.

https://librosa.github.io/librosa/auto_examples/plot_vocal_separation.html#sphx-glr-auto-examples-plot-vocal-separation-py

Thanks @jamesbradbury , I didnā€™t know that resource. The method is nice and clever ā€“ though unfortunately it doesnā€™t sound that great on my exampleā€¦

Iā€™m under the impression that the best results in vocal source separation are achieved by deep networks, but Iā€™m not aware of one ready-to-be-used for this.

I think Iā€™ll get by with the chunky nmf for now, or with manual masking, no big deal :slight_smile:

Again, thanks to all of you for your pointers!

The trouble there is, of course, training them, as you need both time and data in great quantities! A lot of the research-code networks can get good results, but I think (@groma correct me) that a lot of the available ones donā€™t share their trained weights, and are often predicated on vocals, guitars, bass + other training data.

Hello gang,
Coming late to this conversation ā€“ but to add that Principle Latent Component analysis - esp. 2D and shift invariant is great for this.
A minimal interactive interface up here (not by me, but based on our paper https://peerj.com/articles/2108.pdf)
http://rfabbri.vicg.icmc.usp.br:3000/soundscape
The hyper params have been tweaked for soundscape rather than music

These models based on Michael Caseyā€™s Bregman tool kit https://github.com/bregmanstudio/SoundscapeEcology/blob/master/SoundscapeComponentAnalysis.ipynb

Alice

2 Likes

the link is dead for me, is it possible the server went down?

Somehow I missed this thread, but certainly using neural networks would give best results. I guess in the context of the fluid decomposition toolbox it would be interesting to combine NMF with pitch tracking and/or onset segmentation so you can do the notes separately, depending on the level of automation neededā€¦