Thanks Jan. That’s a really interesting paper. We don’t include a convolutional neural network in the toolkit, although @groma has done a great deal of research around them (and this certainly informs our choices about what we have included so far).
CNNs are pretty heavyweight, and for working with audio data it’s unlikely that they are practical to train or run on current hardware without access to a GPU. I see that the paper has some discussion about how to reduce the complexity of their model, but even so, it remains very heavyweight.
At one point did look at the idea of wrapping a general purpose ML toolkit like Torch. However, we were deterred by the added complexity it would entail both in development and usage, especially whilst we’re still collectively figuring out what sorts of approach are desirable / feasible in environments like Max or SC.