Softmax output activation?

Hello everyone!

I was wondering if the softmax function could be added to @outputactivation for the output layer of a MLP network.

That should be very easy to code, but at the same time very useful to whomever wants to sample a one-hot-encoded distribution stochastically (possibly with a temperature parameter? That requires another attribute but it would be massively useful…)

I can of course code it myself, and I will for my own projects, but right now I’m working on an online course on ML and symbolic music and I am using flucoma as a fundamental tool for ML (more about this in future threads… I’ll be sharing some patches with you and hopefully I’m not making too much a fool of myself in using flucoma too badly – I’ll ask you more advice on the subject…), and it’s hard to explain the mathematical nuts and bolts of softmax, it would be easier just to take them for granted. I will end up making an abstraction that I give and take for granted, but it would be nice if it was embedded in the object, I think!

Best,
Daniele

1 Like

@danieleghisi It’s on my list! I think, ideally, we’d want the option to not one-hot encode the outputs for the classifier too,

3 Likes

@weefuzzy Fantastic, Owen! In whatever way you may want to convert from logits to probability that would be welcome. And yes – that may be useful in the classifier too, of course, but also in the regressor, which I often use as an “undercover” multi-label classifier. And the LSTM tool may benefit from temperature too! :slight_smile:

BTW my wording was ambiguous and I’ve changed it. What I meant here is that I hope I won’t be butchering the fundamental tenets of flucoma in my online course… I’ll be posting a couple of examples later on next week, and if any of you spot major troubles, I’d be delighted to have a big red warning before shooting the video lecture in June :slight_smile:

1 Like

@weefuzzy I thought we softmax’ed our classifier’s output vector (hence not having the outputActivation parameter)… but looking at the code it doesn’t seem so. Or I am still a noob. which I am.

woaaa a flucoma related video lecture incoming? I now count the days to June :heart:

I’ll bear all that in mind :slight_smile: So far I’ve had nothing to do with the ongoing LSTM experiments, but sure, when it looks like getting merged we’ll want to make sure there’s parity between the tools.

2 Likes

…except that:

  1. it will be part of an italian university series called MOOC and thereby not open to everyone, but only to students of the academies that enroll – or individuals who pay a fee (no idea how much), e.g. https://www.pok.polimi.it/; and
  2. … it will be in italian :frowning:

These are constraints, as this course is something I am being commissioned to do.
Still, I don’t want to mess things up too much, so an informal eye on a couple of patches would be amazing…

:heart:

I’ve starting considering teaching classification using the MLP Regressor and one hot encoding. I think it’s doable and students can expand on it various ways–including multi-hot encoding. A softmax implementation would be a great extension of this.

Maybe in addition to an activation function it is a buffer to buffer transform with a temperature argument?

You mean something that takes an n-point buffer and returns its softmax (with temperature)? FluidBufSoftmax(buf, outbuf, temp) kind of thing?

1 Like

That is what I’m thinking. That way it could be used after the output of a MLP but could also be used to tweak a probability distribution on a vector of inverse distances (more similar = more likely to be sampled) or some other measure where one wants to sample from a distribution with some “temperature” control.