That’s a slightly edited copy/paste of what I initially posted on Slack, but I wanted to move as much of the context over into this post as possible.
In thinking on this further, individual descriptors such as loudness and timbre are quite meaningful, but generally are about a fixed point in time (or rather, a window in time). And although we normally take some kind of (unweighted(!)) mean of that period as a kind of “summary”, this speaks very little about morphology, or time.
The AudioGuide solution to this is to take individual analysis frames, so there isn’t a “summary” of time as such, and then compare frame-by-frame what you’re looking for. I like that idea, but given my context (short analysis window/latency with long files) is a luxury I often don’t have.
Which is what led me to think of “E” (envelope) as perhaps an equally meaningful descriptor. I suppose that a better solution might be to have a morphology for each (macro)descriptor. So having loudness AND envelope of loudness, but if I’m thinking of a low dimensional space, is the envelope of loudness the same importance as loudness? Perhaps it is. I don’t know.
But mainly wanting to spitball and discuss here what may be a good way to have perceptually meaning descriptors which have equal (conceptual and perceptual) weight in the overall scheme of things.
LPT is a good paradigm, I think, but in my case it would probably be more significant to have a differentiation between something periodic/pitchy and aperiodic/noisy(or whatever), rather than to care whether something is an F or a G. Like, in the overall perceptual and creative space that I’m interested in, pitch is of low importance.
(this isn’t the case for straight mosaicking though, as you can definitely get some use out of pitch in that context, but that’s a separate discussion).