Hi guys!
What would be the best strategy to separate a soprano from an orchestra (mixed together in a sample)?
As a utopic goal, I would like to have one buffer containing the whole voice and one buffer the whole orchestra. I know that reality is harder than this And I know that neural networks are better suited for this task than nmf, but fluid.bufnmf~ is so much handier
Iāve tried it with different parametrization (and ranks), I havenāt been able to isolate notes reasonably well (the vibrato does not help, also the sample is timestretched but that does not seem make a difference), although I think I can manage to simulate more or less what I want (by splicing together different portions of processed buffer). I was wondering if there was a better strategy.
Hereās a portion of source sample
Iām starting from the ābasic exampleā tab in the help file, modifying rank (from 2 to 10), iterations (increased to 150), and analysis parameters (tried to increas them, unsuccessfully).
If anyone has a hint to make it better, that would be cool!
In the meantime, Iāll still keep trying to tweak parameters
This may well be beyond the talents of NMF, especially if you donāt have some isolated samples to make templates from. Basic NMF like this canāt group different pitches from the same source, so youād want as many components (set with rank) as each source has discrete pitches (and, yes, vibrato complicates this).
Unfortunately, I canāt play your sample for some reason, so itās hard to offer any concrete suggestion for this particular problem. When @groma has finished travelling, he might have some ideas too.
Hi @weefuzzy, thanks for your tips! I imagined this could be the case
As for the soundfile, itās weird, indeed: I just copied a dropbox link. If you right-click and copy the audio address, you may paste it somewhere else. In any case thereās no need for you to hear it, what you say is already very clear and reasonable
Thanks @jamesbradbury , I didnāt know that resource. The method is nice and clever ā though unfortunately it doesnāt sound that great on my exampleā¦
Iām under the impression that the best results in vocal source separation are achieved by deep networks, but Iām not aware of one ready-to-be-used for this.
I think Iāll get by with the chunky nmf for now, or with manual masking, no big deal
The trouble there is, of course, training them, as you need both time and data in great quantities! A lot of the research-code networks can get good results, but I think (@groma correct me) that a lot of the available ones donāt share their trained weights, and are often predicated on vocals, guitars, bass + other training data.
Hello gang,
Coming late to this conversation ā but to add that Principle Latent Component analysis - esp. 2D and shift invariant is great for this.
A minimal interactive interface up here (not by me, but based on our paper https://peerj.com/articles/2108.pdf) http://rfabbri.vicg.icmc.usp.br:3000/soundscape
The hyper params have been tweaked for soundscape rather than music
Somehow I missed this thread, but certainly using neural networks would give best results. I guess in the context of the fluid decomposition toolbox it would be interesting to combine NMF with pitch tracking and/or onset segmentation so you can do the notes separately, depending on the level of automation neededā¦