Example 11 - verbose

Yes but with a few major differences: there is no time division, which was the strength of LPT. I think it is time for me to be candid about this patch in writing. I’ll do that just below, but now I’ll answer the specifics first:

Yes. The process you describe is a little short of the truth, so here goes in pseudo-code:

//analysis
for each slice:
  - take the pitch, process like in example 10b (weighed by stingent confidence, thresholded, resulting in a very sparse dataset of valid entries, but we know they are valid. Put 4 dims in PitchDS as is.
  - take the loudness, put that in LoudDS, 4 dims, as is
  - take MFCC, weigh coeffs 1 to 12 (scrap 0) by the loudness from a high ceiling of -70LU, put the same 4 stats than above on these 12 coefs in a MFCC-DS, that you then standardise, and PCA to 4dims into TimbreDS

//assembly of weighed DS and its query
for each slice:
  - normalise the 3 ds - this is to scale their relative euclidian distance as pointed to by Daniele.
  - put in a tree

//for querying
- analyse the target as each item above including (std and pca for target mfcc)
- normalise each of LPT according to the coeffs in the assembly
- query the tree
1 Like