Yes but with a few major differences: there is no time division, which was the strength of LPT. I think it is time for me to be candid about this patch in writing. I’ll do that just below, but now I’ll answer the specifics first:
Yes. The process you describe is a little short of the truth, so here goes in pseudo-code:
//analysis
for each slice:
- take the pitch, process like in example 10b (weighed by stingent confidence, thresholded, resulting in a very sparse dataset of valid entries, but we know they are valid. Put 4 dims in PitchDS as is.
- take the loudness, put that in LoudDS, 4 dims, as is
- take MFCC, weigh coeffs 1 to 12 (scrap 0) by the loudness from a high ceiling of -70LU, put the same 4 stats than above on these 12 coefs in a MFCC-DS, that you then standardise, and PCA to 4dims into TimbreDS
//assembly of weighed DS and its query
for each slice:
- normalise the 3 ds - this is to scale their relative euclidian distance as pointed to by Daniele.
- put in a tree
//for querying
- analyse the target as each item above including (std and pca for target mfcc)
- normalise each of LPT according to the coeffs in the assembly
- query the tree