For those of you who tried to read @groma’s paper of last year and struggled a bit, a more general post has been done here
The article is sadly not giving proper reference to this immense hyperactive field, as @groma kindly pointed to me, and was replied to along those lines by one of the world leaders here
Wow that unmix app is really surgical at getting the different components. I imagine that there is some serious training done on some big clusters. @groma do you know more?
you should check the website of the comparison - there are still a lot of spectral processing artefacts, but indeed for vocal there are good results (and @groma’s were good
Hi, here’s a good resource on this topic https://sigsep.github.io/
For big networks, you generally need a good GPU, as much memory as possible. Multi-GPU setups are not trivial, I don’t know if other people are using them. The network in the post is pretty big, but they mention using a single GPU.