hello
so for me your point 1 and 2 are related. I’m always trying to find a scale where a distance of x feels the same. FluidPitch has midicents as outputs (so 1 semitone can be 1 LU, and the full range of pitch being 80-90 is similar to a full range of usable sounds from 80-90)
once I have that, timbre is more ‘fun’ because mfccs are multivariate (the dimensions are needed together to mean something) and do not scale perceptually simply (+/- 1, or +/- range, do not have the same perceptual impact on each dimension) - I have heard of a paper at IRCAM, and/or from MacAdams, on this, and I think that this is what @naiv40 implemented in his software he presented yesterday here
My trick is to use the full count of dimensions to explore proximity. Then indeed, I try to simplify it. @rodrigo.constanzo has a more data-science approach at one point, throwing in all the dimensions in the world then using PCA to try to make sense… but that was not super successful IIRC. He will remember which of his (fantastically documented research) threads it was in, I hope.