Hybrid/Layered resynthesis (3-way)

Ok, going to code something up with these numbers: 64/256/768, starting 512 samples late, and analyzing full windows for each bigger size.

I’ll see if I can get some querying and brutalist playback based on like-to-like, and then experiment with the temporal mismatch stuff.

Will post patches/results.

edit:
Sticking with the descriptors and such that I know (will save MFCCs for another time, as I already have enough dimensions going here).

Also leaning towards over-analyzing in each time window, seeing what works, and then revising from there.

But this is what I’m aiming to do:

Entries to store in the database:

index - index
name - name
duration - duration
time_centroid - time centroid
onsets - onsets

64_loudness_mean
64_loudness_max
64_centroid_mean
64_flatness_mean
64_rolloff_max (max90)

256_loudness_mean
256_loudness_max
256_pitch_median
256_pitch_confidence
256_pitch_optimized
256_centroid_mean
256_flatness_mean
256_rolloff_max

256_loudness_derivative (mean of deriv)
256_loudness_deviation (std dev)
256_centroid_derivative
256_centroid_deviation
256_rolloff_derivative
256_rolloff_deviation

768_loudness_mean
768_loudness_max
768_pitch_median
768_pitch_confidence
768_pitch_optimized
768_centroid_mean
768_flatness_mean
768_rolloff_max

768_loudness_derivative (mean of deriv)
768_loudness_deviation (std dev)
768_centroid_derivative
768_centroid_deviation
768_rolloff_derivative
768_rolloff_deviation

loudness_mean
loudness_max
pitch_median
pitch_confidence
pitch_optimized
centroid_mean
flatness_mean
rolloff_max

loudness_derivative (mean of deriv)
loudness_deviation (std dev)
centroid_derivative
centroid_deviation
rolloff_derivative
rolloff_deviation

1 Like