Descriptors comparison (oldschool vs newschool)

rodrigo.constanzo · July 20, 2019, 8:44pm

I’m going to try building something that analyzes the first 100ms for a few of the descriptors (loudness max, pitch median, centroid mean, flatness mean, rolloff max90, spread mean) as well as analyzing the entire file for all sorts of stuff (and related stats).

I’ll then query and match something based on the first 100ms and then analyze that subset again for the best match within that pool too. That way I can get something that’s relatively close to the 12ms analysis window, but also matches the whole file too.

Perhaps this is more a question for @a.harker, but in all of the C-C-Combine stuff I’ve been using the <-> matcher because I’ve wanted something within a specific window, to use as a pseudo-weighting. In that use case, it is not unusual (and often desirable) to have a query return nothing.

For what I’m doing now, I always want a query returned, even if it’s very distant. I want the best possible match of the choices available. So is the distance matcher a better choice here? How does it work if you chain queries together?

Like in my example, I want to query the first 100ms (separate entries in entrymatcher) for loudness/centroid/flatness (in that order). If I use the distance matcher, will it return the single closest value for loudness, and then the single closest match for centroid of what’s left, etc…? Or does it pick a single match that is closest to all three descriptors? How would that work if I then want to query the returned subset based on that?

I guess I could still use the <-> matcher, but with really wide criteria (loudness_max <-> 100. -106.656967) and just narrow the queries down as I move along, but I guess this can run the risk of not returning a match if I go down the chain far enough.