LTEp demo patch

After spending a few days building the first step of the LTEp idea, I wanted to share some code showing where it’s at.

LTEp(2).zip (80.7 KB)

You point it to a folder of samples (individual samples, no segmentation here) and let it run. Once it’s done it will plot the composite space, along with each individual sub-space.

Each individual subspace is as follows:

  • Loudness (4D) - mean, std, min, max → robust scale
  • Timbre (4D) - loudness-weighted 20(19)mfccs, mean, std, min, max → standardize → 4D UMAP → robust scale
  • Envelope (4D) - deriv of loudness mean, deriv of loudness std, deriv of loudness-weighted centroid mean, deriv of loudness-weighted rolloff mean → robust scale
  • Pitch (2D) - confidence-weighted median, raw confidence → robust scale

There’s still some improvements to do all over the place, not the least of which is optimizing the scaling and things like that, but for now everything is robust scaled at the end, and prior to that, each subspace is massaged a bit to weigh the internal components (e.g. I divide the pitch output by 127 so that confidence matters a lot more, and I scale the deriv of loudness up so it is more in the range of the spectral derivatives).

The visualizers are a second layer abstraction as most of those are 4D spaces that are then getting further reduced to 3D for visualization. The Timbre space in particular does back-to-back UMAP-ing to go down to 4D then down to 3D again after that. The Pitch space is also dimensionality increased to go from 2D up to 3D. I’ve also plotted the raw 2D version and you get more a vanilla x/y looking plot, rather than this banana, but the overall mapping and direction are similar.

It’s interesting to see how effect the spaces are organized internally, and then comparing that to the more complex 14D space that makes up the composite/overall space.

Oh, I almost forgot to mention. All of this is computed on only 256 samples. That should not come as a surprise to everyone as I don’t shut up about it, but what’s significant here is how differentiated things are even in that small time frame (for the audio I’m feeding it at least).

The main analysis abstraction can be fairly easily repurpose-able for real-time matching, which will be one of the next steps. As I think that will yield interesting results.

I also plan on analyzing multiple timeframes on each pass, so I have 256 and 4410 sample windows, along with “the whole file” as well, to then incorporate prediction, and hopefully weighting in the future.

2 Likes

this looks soooo cool I look forward to try in (after my notam talk with @saguaro and @natasha for @balintlaczko though :slight_smile:

1 Like

Whoops, realized I was visualizing the pre-robust scaled versions of the sub spaces. The patch in the first post has been updated. But the viz now look like this:

This is good fun. And very slick. I have a few questions and a few ideas/suggestions if you want to hear about them?

1 Like

Yeah totally. It’d be good to tweak/massage things before I get further into the process of adding other stuff.

  1. the first thing I wanted to know is why you used standardisation before the UMAP in the timbral space. Did you try directly, or with RS instead?

  2. By doing the latter I digged in the patch and found the bit where I needed to change it and I found the message that needs pressing for the stuff to be re-umap-ed which I think should be exposed because…

  3. UMAP has a lot of parameters to tweak and the number of iterations is quite low so I get a lot of varience in my case so exposing those at the top would be fun too…

  4. and then in line with our paper I wanted to replace UMAP by PCA and MLP-autoencoding just for shits and giggle…

then I started to want to mess things about with the timbral space reduction but that is my own pet stuff… so there we are, a few additions to try, but I very very much like the subspace visualisation idea - I think I’ll go and continue to polish the visualiser with the features I promised to add… Nice one again and thanks for sharing!

I initially had fluid.robustscale~ at the start of everything, but then it seemed weird to robustscale → umap → robustscale again.

Are MFCCs happy to just be chucked into UMAP ‘as is’?

I may expose that as a thing, but for the most part I was thinking to just bake in the settings I use and go with that all the time. I’ve not yet spent time with UMAPs settings to see what works best here, but @jamesbradbury offered up this handy resource:
https://umap-learn.readthedocs.io/en/latest/parameters.html

As I think I mention in the other thread, I want to have a set of analyses/processes that I can use on different corpora generically, so the corresponding realtime patch matches.

It would be great if the realtime analysis adapted/mirrored the settings from a corpus/dataset, but that’s not in the cards. So I’ve got to build everything twice+.

That being said, I’m working on a dict structure to save all the metadata, info, fits, etc… so that I have a single file that I can load. I only store some superficial settings info (fftsettings, numframes, etc…) so that I can remember and know what’s happening, but it would be great to have something more comprehensive that can just populate a corresponding realtime analysis path.

That should be fairly easy. I tried to name things somewhat generically (#0timbre_reduced as the dataset name). Would be curious your results with the MLP stuff as I still struggle to get that to be useful at all.