Ok, going to code something up with these numbers: 64/256/768, starting 512 samples late, and analyzing full windows for each bigger size.
I’ll see if I can get some querying and brutalist playback based on like-to-like, and then experiment with the temporal mismatch stuff.
Will post patches/results.
edit:
Sticking with the descriptors and such that I know (will save MFCCs for another time, as I already have enough dimensions going here).
Also leaning towards over-analyzing in each time window, seeing what works, and then revising from there.
But this is what I’m aiming to do:
Entries to store in the database:
index - index
name - name
duration - duration
time_centroid - time centroid
onsets - onsets64_loudness_mean
64_loudness_max
64_centroid_mean
64_flatness_mean
64_rolloff_max (max90)256_loudness_mean
256_loudness_max
256_pitch_median
256_pitch_confidence
256_pitch_optimized
256_centroid_mean
256_flatness_mean
256_rolloff_max256_loudness_derivative (mean of deriv)
256_loudness_deviation (std dev)
256_centroid_derivative
256_centroid_deviation
256_rolloff_derivative
256_rolloff_deviation768_loudness_mean
768_loudness_max
768_pitch_median
768_pitch_confidence
768_pitch_optimized
768_centroid_mean
768_flatness_mean
768_rolloff_max768_loudness_derivative (mean of deriv)
768_loudness_deviation (std dev)
768_centroid_derivative
768_centroid_deviation
768_rolloff_derivative
768_rolloff_deviationloudness_mean
loudness_max
pitch_median
pitch_confidence
pitch_optimized
centroid_mean
flatness_mean
rolloff_maxloudness_derivative (mean of deriv)
loudness_deviation (std dev)
centroid_derivative
centroid_deviation
rolloff_derivative
rolloff_deviation