Softmax output activation?

danieleghisi · April 10, 2025, 9:34am

Hello everyone!

I was wondering if the softmax function could be added to @outputactivation for the output layer of a MLP network.

That should be very easy to code, but at the same time very useful to whomever wants to sample a one-hot-encoded distribution stochastically (possibly with a temperature parameter? That requires another attribute but it would be massively useful…)

I can of course code it myself, and I will for my own projects, but right now I’m working on an online course on ML and symbolic music and I am using flucoma as a fundamental tool for ML (more about this in future threads… I’ll be sharing some patches with you and hopefully I’m not making too much a fool of myself in using flucoma too badly – I’ll ask you more advice on the subject…), and it’s hard to explain the mathematical nuts and bolts of softmax, it would be easier just to take them for granted. I will end up making an abstraction that I give and take for granted, but it would be nice if it was embedded in the object, I think!

Best,
Daniele

weefuzzy · April 10, 2025, 10:10am

@danieleghisi It’s on my list! I think, ideally, we’d want the option to not one-hot encode the outputs for the classifier too,

danieleghisi · April 10, 2025, 10:45am

@weefuzzy Fantastic, Owen! In whatever way you may want to convert from logits to probability that would be welcome. And yes – that may be useful in the classifier too, of course, but also in the regressor, which I often use as an “undercover” multi-label classifier. And the LSTM tool may benefit from temperature too!

danieleghisi · April 10, 2025, 10:50am

BTW my wording was ambiguous and I’ve changed it. What I meant here is that I hope I won’t be butchering the fundamental tenets of flucoma in my online course… I’ll be posting a couple of examples later on next week, and if any of you spot major troubles, I’d be delighted to have a big red warning before shooting the video lecture in June

tremblap · April 10, 2025, 11:01am

@weefuzzy I thought we softmax’ed our classifier’s output vector (hence not having the outputActivation parameter)… but looking at the code it doesn’t seem so. Or I am still a noob. which I am.

woaaa a flucoma related video lecture incoming? I now count the days to June

weefuzzy · April 10, 2025, 11:12am

I’ll bear all that in mind So far I’ve had nothing to do with the ongoing LSTM experiments, but sure, when it looks like getting merged we’ll want to make sure there’s parity between the tools.

danieleghisi · April 10, 2025, 12:07pm

…except that:

it will be part of an italian university series called MOOC and thereby not open to everyone, but only to students of the academies that enroll – or individuals who pay a fee (no idea how much), e.g. https://www.pok.polimi.it/; and
… it will be in italian

These are constraints, as this course is something I am being commissioned to do.
Still, I don’t want to mess things up too much, so an informal eye on a couple of patches would be amazing…

tedmoore · April 10, 2025, 4:19pm

I’ve starting considering teaching classification using the MLP Regressor and one hot encoding. I think it’s doable and students can expand on it various ways–including multi-hot encoding. A softmax implementation would be a great extension of this.

Maybe in addition to an activation function it is a buffer to buffer transform with a temperature argument?

weefuzzy · April 10, 2025, 5:01pm

You mean something that takes an n-point buffer and returns its softmax (with temperature)? FluidBufSoftmax(buf, outbuf, temp) kind of thing?

tedmoore · April 10, 2025, 5:14pm

That is what I’m thinking. That way it could be used after the output of a MLP but could also be used to tweak a probability distribution on a vector of inverse distances (more similar = more likely to be sampled) or some other measure where one wants to sample from a distribution with some “temperature” control.