Making sense of fluid.mlpregressor~s autoencoder

Google never prematurely kills off projects that are enjoyed and used by many people.(!)

And after banging my head against the wall to get this working, this may be the case here:

No more swift in colab. But they do have python.

1 Like

I don’t know what Collab necessarily offers over just running it locally unless you really want to do GPU training which is problematic in and of itself. In my experience its not much faster on the free tier than just running something on my machine, and then you dont have vendor lockin to boot. I’ve already made an adapter for Python > Dataset in python-flucoma · PyPI if anyone wants to try it out. That would get you halfway to transforming stuff pretty easily for the Max side of things. They might be outdated though given how things have changed and my lack of involvement coding anything lately.

EDIT:

I realise I sound kinda narky - i had some bad times with early google collab. What I didn’t think about was how it simplifies the process of working with python who dont want to setup a local dev environment (which is a PITA sometimes). I wonder how easy it is to get audio up into a collab?

Also just to clarify Collab runs Python and is not a language itself so you are tied to writing python if you do want to use it.

And just to continually add fuel to the fire I would be super interested in training something like this to be used in Max:

https://towardsdatascience.com/one-shot-learning-with-siamese-networks-using-keras-17f34e75bb3d

They looked pretty snazzy and contemporary to me when I first looked which is a while ago now, but perhaps super relevant to those who want to do classification with not much data.

1 Like

One shot learning in 2005 on HSN:

2 Likes

That is a cute dog :dog:

1 Like

I was thinking about this today as I was planning on recording a much more comprehensive set of “sounds I can make with my snare”. Knowing that the corpus would be so specific to the snare, the head, the tuning, and the room (to a certain extent), and that it wouldn’t necessarily translate if I went to a gig and used another snare, or even just had my head drift in tuning over time is a bit of a bummer.

So that led me down a couple paths of thinking.

  • creating the minimum viable corpus for any given snare (maximum variety/dynamics, with a generous helping of data augmentation to fill in the gaps)
  • create a monolithic corpus for each setup I have and just streamline that process
  • thinking about the viability of having a mega-chunky-corpus, that is continuously fed new snares/setups/tunings and keeps getting bigger every time I use the system with a new drum
  • if it’s somehow possible to train a NN on some kind of archetypical aspects of the sounds (within the word of “short attacks on a snare”), which is then made a bit more specific with samples of the exact snare in any given setup

Part of that last example was remembering the topology if the machine learning snare thing that I was looking into a while back:
b0a315b38311ba316bc732a7f32d3e79a3bf2956_2_386x500

It could just be that this makes sense for the purposes of the patent application but from the looks of it, the NN is trained on data that is distinct from the user generated and trained aspects. In fact, remember when I last used the software, you would go into a training mode, and give it around 50 hits of any given zone (“snare center”, “snare edge”, etc…), and then come out of training mode and it worked immediately. There was never any computation that went along with it (unless it happened as you went and was super super super fast). You literally toggled in and out of training mode ala a classifier. But there’s an NN involved somewhere/somehow. How?

its likely its just super optimised for one specific purpose. No fucking around with memory allocation - its probably all pre-allocated for everything that is needed. It’s certainly not implausible that it trains fast enough with such small datasets too with the level of optimisation they could give toward a small use case

It’s a patent, so purposefully quite vague on certain details – if you read the text, there’s a lot of ‘may’ going on. However, if one perseveres, then some general impression seems to come out.

For example, one of the things they ‘may’ be doing is using a ‘Siamese’ network architecture (like @jamesbradbury posted the other day, and I can see him replying now) in such a way as to learn a distance function from labelled data which can then be applied to unlabelled data. I don’t know how much that specific trick is still popular, but there’s still a lot of active research into metric learning (i.e. learning a distance function) and transfer learning (training a general model first, and then making it – quickly – more specific with some extra examples).

1 Like

It looks like some of the ongoing research with UMAP looks at these cases: UMAP for Supervised Dimension Reduction and Metric Learning — umap 0.5 documentation

(but that’s not in ours)

1 Like

It’s referenced at the end of the umap article even so it must be cool still :wink:

Most interesting.

Yeah, the way the system behaves is like a vanilla classification thing where you give it examples, and then that’s it. So I was never sure how an NN fit into the equation.

I’m assuming it’s not nearly as simple as this, but would the idea be that (using my multiple snares example/context), that I could train a large set on given sounds, and then when presented with new variants, the overall “distances” would hold up and still be relevant useful, by (I guess) somehow transposing/stretching the existing points?

Aha! This sounds more like what I’m thinking.

Also, are both of these (metric/transfer) limited to classification or does the paradigm apply for regression as well?

In response to @spluta:

https://github.com/tedmoore/FluCoMa-stuff/blob/master/sklearn_mlp_to_fluid_mlp.py
Should still work if you’re looking at sklearn. If it doesn’t, let me know and I can poke at it. You’re probably looking at Keras though @spluta?

EDIT: Also note that in the output activation will always be identity. This is the default (and only?) setting for sklearn’s MLP.

I don’t think that’s quite right, because relu and tanh won’t ever get used, even though they’re valid. Perhaps something like

activation_map = {'identity':0,  'logistic':1,  'relu':2, 'tanh':3}
acvtivation = activation_map[mlp.activation]

?

Can one not string together multiple elifs in Python? Regardless your implementation is much more elegant. Will edit.

You can; sorry, I hadn’t noticed that the preview was truncated. I am, however, allergic to big if trees :laughing:

2 Likes

That is a good allergy to develop. :sneezing_face:

This is awesome! No. I was looking sklearn. Thanks for this.

Sam

The better way to do this is to create a class for each activation with its own custom type. You can then use isinstance() to check which type of class it is.

/s :slight_smile:

1 Like