Dimensionality reduction for temporal organization

Greetings,

Sorry this is not necessarily a code share as the code is a bit noodly right now, but here’s a project I did last month using UMAP and a Traveling Salesperson Problem solver to organize audio slices into 1D paths and the reconstruct them through time.

  1. I started with a bunch of audio and video recordings of some music that I composed.
  2. I sliced all that audio into 100 ms slices.
  3. I analyzed those slices and extracted audio descriptors (714 columns by the time the 1st derivative and all statistics were done!).
  4. I reduced that to 11 principle components (and was able to retain 99% of the variance!).
  5. I then used two ways of reducing this further to 1D, the first was a UMAP down to 1D and the second was a TSP solver algorithm (in python… python-tsp · PyPI FWIW)
  6. Then with each of these 1D sequences of slices, I reconstructed the audio-video file.

The tsp-solver solution is quite jittery, frequently jumping between different audio files and not staying in one place for too long:

One can kind of see that in this plot. Each dot is a 100 ms slice of audio. The X position shows the slice’s time position in the reconstructed sequence (the youtube link). The Y positions shows what source file it came from and where in that file (bottom of the “file’s bar” is at the beginning of the file, top is at the end).

The UMAP 1D solution is more smooth, showing crossfades of density as it transitions from one source file to the next:

That can also be seen here:

While creating the final piece, I selected some passages from each of these but didn’t end up using all the material, just the passages I liked! You can watch the whole thing here (password: framerate). The piece isn’t supposed to be public yet… so it’s behind a password…

I hope it’s interesting, maybe creates some ideas or inspiration!

T

4 Likes

This is epic, thanks! I’m on a train so I cannot watch the videos yet but I might tease you in the other thread r.e. your last paragraph here :slight_smile: What I like very much is the transparency of the process, where data mining is investigative and put against itself.

A few thoughts right away, on the more ‘objective’ matters:

#4 means that you had a lot of redundancy in #3. But @groma will confirm that what I say is true.

I’m impresse by how well UMAP has fared - I presume you tweaked the local-vs-global topology priorities a bit? Could you share how you manage to tweak it?

were you not tempted to do a more perceptual segmentation? did you try and abandoned? I’m very curious about the investigative process, where you stalled and how you curated that clean list. Maybe it was just as is from the outset, but maybe not?

Sorry for the enthusiasm and barrage of questions :smiley:

The video started: people around me in the train looked at me suspiciously - I probably clug their bandwidth - or their eardrums with unusual music ! :rofl:

Yes! So PCA was an important step! Lately I’ve been doing a lot of big analyses like this and then letting PCA get rid of the junk for me, especially since, in these cases, it doesn’t so much matter which raw descriptors are most influential, really what I’m after is the variance between slices!

I did no tweaking, I just used the Fluid default values. The output was really compelling so I ran with it!

I didn’t try perceptual segmentation this case. I did that elsewhere in the piece, but here I wanted to have it be more temporally consistent–like the section was clearly a tour of the slices, of the sounds in the recordings, without the capital M-Musical content they would convey. That way the musicality of the section comes out of the sequence of slices, out of the algorithm, more than out of the musicality of the slices’ content.

Also, I had done something similar before, with just audio, not also video, and liked the results with 100 ms slices, so I was following after that.

It was pretty much as is. Like I said, I did something similar previously (a few years ago, actually I think you and @a.harker heard it in Chicago?) so I was essentially recreating that process with the Fluid tools. That first time perhaps the list was less clean but I don’t recall the investigative process from then to try to retell it.

2 Likes

This is super cool in general. I specifically like the travelling salesman restriction.

This is super interesting as well, and though we ended up chatting about this in the last Thursday chat, I wanted to bump the thread version of it.

I’ve been chasing that dragon of having conceptually meaningful sub-groups of descriptors (LTEp/LTP) but that gets problematic when you’re combining sources in which some characteristics aren’t as relrevant/meaningful. For my case, this is often pitch, but if variance is what one is after maximizing, it’s possible that it may not make sense, or be sub-optimal to conceptually group things ahead of time.

Maybe some kind of hybrid approach where you have a reduced descriptor/statistics space (similar to what you have here), where it doesn’t matter what’s in it, it’s “a good representation of the stuff”. And then along side having some perceptually meaningful descriptors (loudness, pitch, centroid, etc…) that can be used to bias/skew the query (e.g. return the nearest match from the pile of goop that also has loudness > -6).

That’s some (more) functionality that isn’t presently possible with the querying tools we have, but having a kdtree that can also resolve parallel logical queries (e.g. find the nearest neighbor from columns 5-20 (which have been pre-fit) && some other query on column 2).

2 Likes

Oh yeah, I remembered I wanted to bump this bit as well. From our discussion you mentioned you had to pull some numbers out of the .json and manually compute the variance.

I could be wrong here, but it sounds vaguely familiar that this is something that was going to be added (the ability to request an amount of variance, rather than a specific amount of dimensions). Or maybe that was the error value being returned from UMAP or something else.

Either way, this seems like a super useful way to use PCA such that it would be nice to have as a native feature, or a pre-baked abstraction layer to get to the same results. Perhaps this is what @weefuzzy was building towards with his variance/novelty diagram stuff from a bit ago.

1 Like

Yes, I gestured at it, although the main focus was on plotting the correlations between input features.

Getting the number of PCs corresponding to a proportion of the input variance is easy enough using the values from dump, because these are always properly ordered. One just needs to accumulate the list so that it goes 0-1, and then find the index that accounts for the requested fraction. Here’s something quickly hacked from my own personal abstractions for doing the list accumulation and index finding.


----------begin_max5_patcher----------
1974.3oc6asziahDD9r8uBDZusdY6m.czdXWsR648dTjUaL1CIX.AMSljnr+
129AXCd.L9AwSRFOR1i6hltput5pq5qweY9L6UoOEVXa8Fq2ZMa1WlOaltIU
Cyp99L6c7mBh4E5KyNI7ioqdu8BiHQ3SBcyYVIo46paNobWZoHNTn6Crp0Lt
H3gnjsKyCCDlwD5Sc.Krfd.GWJigfPpKzG6tvhvTBPHGf06p5u4VJ9TVnoy1
16EEsVqERM62X3FZQTRmJQXdkEVYhyr2DEG9XXdQTZRiqdlMOKqQyyZzEEt7
9T8Mxew9lhRLMA22Td3iQ6usGZlmKwBgDHJy0li8StD6C2mz0g4IkQ5NYZ7q
yq0I8jQBeWXQFOvzY0bVs3FnKPChPJQiklugnrCXpb1dabZvGB03Gntwzrvj
njr7vhvDAWTo86EuNbCuLVrbSZhnH5yZU.Jmo5R9lJUrSgJiPq++UdDOduEr
MOZcZhRIZMWnZtd3jFmwyg1zXzWQBOqiNKcMj3ROBKjFYYwJdtZpZUrdDP0B
EoowsEsuewgaDUhyhRRNBEEoY8KLOZ6CCz2UoRg6F5dqkTrrLwHcozoPrrf+
XazVviiqVx191+DOIZGWDJhLSAHvdggIbog9PQPdZbbK60H4wNjrV5kGD9wn
0hGzCTSmA4kGkU6DYueVdcz1vBQ61D7sEsaoP7ICn2noxUUqhWJB2kEKsh1W
fb4QTgn3gzOVTcg0NZMAfCw8ZtptY7uVsOTbv1wBeL7orbqeYCz52kuir9yh
.dLOWsl1B1rGwQIgAokIhl9a8F+ruXnTfdg.Fq+..UefatrX3XmGG+DgraqI
Ghghpa+qymW+OKt832misJJ2Y2KdfFFOfXUnHKlN9.w83sQ5ANVLDj.+l.I5
6p804FPzui2+9E5D3x50hAMDDjtaWXR60v56Tx5vmZr20MAdLZe+3CXTKSXn
A.nFHfGnWD.dsHvgMrTQ.5ILj1DTx6FZJRKyCpmGqcRsZaNxPuhnj86e+1CK
vsfstvcQqyRiRDUpBj.cnxqAY9nw2rd2nlPOWcW4tMdc+Hm570lz4f2YUC1u
pAlDUqCvnaUS4KqtvNcEm2L4RaUBDqWZRVZIWHxiVUJLtnMSW9r1TWlg4Jd7
Q6D20d9yOnb52MfzUUUBOHn7hKKAI2LEiHL.jPnHDiR8VXQ8uzBSHuVXR2El
f0eC6gdsvjWKL40BSlrBSDVwWY5kPPUVTiLq59xtB62e1UeaJxPXsxZyEWig
qoXKeOGeWr7OhK12GPwr5LvOM1rhmrUU0wl3TtnefhcuAJY0XaySKytXrx34
f77eNVQo2fpyHf6cAqxsKyu3EVUtRt.GpOxC6Qco.Bf56qvrqbYFAdu8djaU
K2SasUwUTYKDS6c0l2UCQnWBQhhkaL0O9fGUnYo2By2CgQ.JCQ.XBTEulbdQ
iNcDIB9dCX5b5GDvN0RNjimrhVjrv1m4P4BGGfMLHA6OrM9EHiQpWjJPffuA
7GQ+Qk+HLriswNMaRteuwljJAsQwsfJbPObKb8LbfGK4KP1TqEvwpEvIRKHi
keO7ThEDzYNiLDKiDEuhHDnis0QNntkMU1Edr96foDcwizSqCmgiQW.xPCgq
CkRbIdDOJwGBvXM5R6TzTZVnwXVnIDbgrWBSwzwpE0QfmlYDvH0B2e3oNtdO
2i4N9e+6ht4MFL.uw.S4Ht5BVLmHKpQZAsnCF0Mcv2PS3ex4AJb1JcikHUvi
sdjmGwSjtBmqkU8X5Pcnxfv.2J9vcIpvHvAMS3MyL2EVTv2F9LyDc176Woxl
oIJ3BIz2yuSSCcAllrZhj9NmhJ5UTcHZm7hTlly.SUDRumcATSF9or0EUkC1
kM6t2lkVbs5rmTU6Mo463Z8vcuBlKW1JByWZ3nsoe1U6Qz6w9rQlw7Y6UTAb
XnrNZluOzCSPxcFMjybRXKJQzMh41sWxqG6ioTaYzjWO1mWO1mWO1mI6Xeph
TK203J4vuJ8FO2yghrEsCM9Llxt2j2+qVvKlAMlAITL+3iH9Lf4ET1FyTa6o
vngfFp28FapS43Jca78bftDlmeM9n3YcjG8iBfNgKDEeuwoCOwm+wvOwmWxQ
tJSV0GvPR.T+x20WkIt6UdtGTxOcOSi3WdbRecvCxDO9VgOjaC9fd4vYeU7Y
0QrJiOS8LqfXdjQwYO86LN6U6WLNRcnSIAWikCyZ0cZzBxHwB5Td9EtikUY3
Tx64XQBxThDjyQKf+D9rplKCqEZstbW1vz.MvuhNY40XDi4BnxTsXPYdBdjw
w2SmLW.uc7a0iMuNJP3HqXjG7AqG4wkgEu4h+IDhYNLFgQ.9tdPrmlsqK82P
HzaxIsr+I5Q7bIScPXOHBxpeISB.do1JZpmk2DWFs1IKf+ecav3SZvdTGWY9
u9.jrFKB8P4C3gL2XYQ2GddV5yKe+OWzVrGgaPoYO7B0uugdXZuytA9NNpYE
H97P1sN8891ynw9mOahZrijhcxSNRJhtqiIewiDCOhQhQtA1DzaLiD9FLRLx
HFIMDesnWqSPsOGh1dMGcfvHYYBTySDlJCXO.Aiot99dpxIYjdjcsSEnQ.P2
hYBW+wNSzdjLKUOhUc0fbDa5GwjdGrn2OC5GydtNAl9XMe9Wm++fJlmmD
-----------end_max5_patcher-----------
2 Likes