I’m assembling a patch with “fluid.mlpregressor”, in which I train the n.n. to give me the closest slice of a corpus, according to a live audio analysis.
The n.n. is trained with 2 plotters on which I select my points , they are store in datasets and then I fit, one dataset is filled with a series of slice number, the other with 13 mfcc coefficient. The goal is when I analyze live audio, I predict the closest slice in my corpus.
I saw in the documentation I should use “input activation function 0” as the mfcc vary below zero and above 1 , and “output activation function 2” as my output will be a slice number equal or above 0. The rest of the parameters are set like in the “pure data regressor example”
I wanted to test it with only 12 points,
when I hit “fit” , the error is “9535.35” if I fit it again it goes up or down a tiny bit.
So I thought I should ask on this forum, am I setting things right in the mlp regressor object, and also does this idea make sense ?
At first I tried matching 13 mfcc coefficients of the input sound with 13 closest coefficients of the audio corpus, my idea was to train a few points, then test with live audio input & add points until I find something interesting to play with. I’m doing this very naively as it’s the first attempt at using mlp, and with limited knowledge. Any lights welcome!
The biggest problems you’re facing so far is probably that you’re not using enough training data if you’re only using 12 points but also that the network isn’t big enough. With just a single layer of 5 activations, this is saying that the nearest neighbour problem can reduce to some (mildly) 5 nonlinear combinations of 13 input dimensions. As a rule of thumb for neural networks you want much more training data than you have parameters to learn (which is two per parameters per neuron for an MLP), but also – in general – much more training data than input dimensions.
For the other settings:
activation doesn’t matter so much until you have a handle on the size of the network that you need.
turn validation off for now; with so little data, it’s not helping
your learnrate is the most crucial parameter to start with. You have to find the useful value for this particular data-model pairing by trial and error. If you’re seeing the training loss jump around, then it’s probably too high: shift down coarsely (orders of magnitude at first) and home in on a range where a round of training consistently lowers the loss (use a much lower maxiter whilst you do this, perhaps even 1, and call fitpredict multiple times to track the loss)
ReLu is often not a good choice for the output activation because its range is unconstrained. I can see the logic, but it might make training much harder: if you’re trying to scale to indices of data points could be easier to scale output range into [0,1] on the way in and back up to full range on the way out (e.g. with fluid normalize)
A final query is why you are using an MLP for this job rather than a KD-tree: if you just want to get the job done, then the tree is going to be easier (by far!). If you’re just having fun / learning, then don’t let me stand in your way.
I’ve learnt the basics of using a kdtree to find neighbor matches, now when I try to compare corpuses that do not have much in common it always find the same region of points.
I’m generally comparing chunks of 10ms against chunks of 10ms, and try to produce sound textures based on other sound textures. ( I would never do that if Flucoma did not exist in the first place!) now obviously I want to improve on what I got…
I thought about using ml to pick manually a few points from an audio sample of my input with their mfcc coefficients to match points of a corpus that I also pick manually, and get their mffc coefficient as well. So instead of matching neighbors it matches according to what I’ve decided, and also finds interpolations. ( again I’m guessing this is doable maybe there are easier ways of achieving that. “that” is exploring a corpus with a live audio input that has not much in common.
I thought about another way of doing this, by first dividing both my corpuses into many clusters, so I would just classify clusters with ml , then use a kdtree to match mfcc coefficients from the input cluster with coeff of the corresponding corpus cluster.
I’ve read a lot in the forum in order to get inspiration about what I could use Flucoma for and ways of doing things, though I can’t read ‘sc’ code and can’t open ‘max’ patchers, making the learning curve even steaper.
For example I read about “distorting” a corpus so it matches more with another, BUT I lack understanding and ideas about how to distort, and achieve that.
I thought about adding ml to my work in order to explore ml objects a bit, and also because I like the idea of tuning a model bits by bits with until it gives expressive and “meaningful” results.
I’m having loads of fun, and I’m learnig a lot , even learning more about puredata as Flucoma makes me go out of my comfort zone.
But I also started incorporating Flucoma in my job as a musician ( for a show with performers, and another show with spoken word poetry ) using it for doing only tiny things at the moment.It got me learning faster as I can’t just play around infinitely but have to get results.
Flucoma library is brilliant, exactly the type of tools that fits with my area of exploration, adding new dimensions to it. Cheers !!!