Ways to test the validity/usefulness/salience of your data

Yes, this is more-or-less my workflow. For what it’s worth, in this project, I didn’t do any loudness scaling. I think maybe it wasn’t implemented when I did it, or maybe it just wasn’t yet on my radar. But at this point, I do think it’s a good idea for you to do the scaling.

You might also consider scaling the pitch descriptor by the pitch confidence descriptor. Maybe you’re already doing this?

When you have lots of data like this scaling is definitely a good idea. I’m curious why you are doing robust scaler to some descriptors and standardization to others? Have you done some tests that show one scaler to be better or worse for certain descriptors? If it’s a lot of extra fuss, you could just put it all in one dataset and then scale that one dataset using one scaler.

I’ll be very curious to hear what comes of this!