Ways to test the validity/usefulness/salience of your data

As @tremblap and you said, this does sound like you’d be rotating and/or “flipping” the space to get certain descriptors roughly on certain axis. One approach might be to analytically find the “brightest” and “darkest” (or whatever) points in your dataset, then rotate the space so the relationship between those two is along the Y axis (for example). Then find the “noisiest” and “smoothest” points in your dataset (whatever that means) and further rotate or flip the dataset to make that relationship generally an X axis. There will likely be some tradeoffs no matter what (these dimensions may not, will likely not, be orthogonal!)

Instead of finding the “brightest” or whatever, you could try to find the “centroid” of brightness with something like a weighted average.

For what it’s worth, approaching something like this in the past, I’ve had success with doing a grab bag of analysis, like you describe, doing BufStats on all of it, Standardizing, then PCA (90% variance or so like you say), then UMAP.

I’m not sure. I think I have been normalizing the output of UMAP before using Grid, but it will probably give you results nonetheless. They may be different though because normalizing the output of UMAP will stretch it somewhat–changing the distance relationship between points.

A little bird whispered to me that the data for making this may become more readily available.

I haven’t continued pursuing this idea (PCA/SVM thread). Also, @weefuzzy offered a bit of a lit review if it’s interesting to anyone. Since you’re generally interested in determining how these slices are different from / similar to each other using the principle components will get you farther than plucking the same number of original descriptors. Yeah, you’ll loose perceptually relevant axes but the rotation above may get you back to a good place.

Also for what it’s worth, since you’re interested in having the axes be perceptually relevant, you could try to 2D plotting on what you’re interested in (pitch and loudness or centroid and noisiness or whatever) then putting that in fluid.grid~ and seeing how performative and musical that feels?

With whatever plot you have, you could totally just make the ERAE display color based on whatever analysis you want. (I know you know that).

Or you could do some clustering / sorting before the fact. Break up your corpus in subcorpora using fluid.datasetquery~ for a manual approach or fluid.kmeans~ for an unsupervised approach. Then run your analysis -------> grid pipeline on each subcorpora and then put the “subgrids” in a bigger grid, such as quadrants or whatever. You could even rotate the subgrids to try to get them lined up in some important way (I’m just riffing here now…)

1 Like