Bufstats~ on short buffers?

I will run some tests to see what leads better results: replacing MFCC0 with loudness or scrapping it all together.

keep us posted! Also, I presume that voice X voice like you do should be quite in the same ballpark, but voice X piano will be a more potent research field for Loundness vs 0 vs none, and raw values vs normalized vs standardized.

Me neither. I still got something. For instance, I said something wrong in so many threads now… I keep saying that MFCC0 is linear in amplitude, which it isn’t. But it is not log properly either, as this patch is trying to explore (dirty, not commented, but with so little items it is clear)


----------begin_max5_patcher----------
1273.3oc0ZssjahCD8Y6uBJp8oc83R23V9U1J0Tx.1gTfvEHxNSRk4ae0M7X
OACJCBONyCXPHotOsNc2pEyOVuxeW8S4s9dex6e8Vs5GqWsR0jrgUlmW4WQe
Jsj1p5l+9xZVWk+F8qNR4oeofc3wl7TtdZf3vsfMdA.j9G4Uj3duOaFiX3Er
xbtZ9flFa4OWlqDP+TWjodrd2We.idUdMzpbddyi4L5N8H.8uqIuMmwo7hZ1
E5C4L8AXtbRa1W2TQUcM7U8qti2qfHSq5l3OeLWOsB8zyeGkcv26yxd7y0qk
W1XoYjk+eBj0CKd9SJUvmxqy1MhwEAT.fDmn9A5BiKvePfCuJveW.dZdSjh2
PHINi2fRlCuIJ9L84Nm278J5QuGvwXuHjPM8faGyRGqIQPkAGBmfEEXkoN99
fEgBBTnIA6NVTjMrnOH9vzFDhZUFqYytwfPliaUXx1n.oBA9Cvu5eD5lEFWj
x3hmvQBYkwE+a5HIX.R60hPdRfZtCvYbG37BIiLbG78D2oJuskdH+WHO+EbZ
SKNRGtZav7YNmRjOk4iLrq2hF49JNX6K6Jx1VV2kwDVwWlzfgzrvDWryG3Ux
YcUZRYQKWRUVHuMLFrMvwta2y4tdW9M8FI.1YNNvfaxdWp5J4EskEY4MifOB
PGQHLQu7qJdAI2i1bxXG12Xaw2U8BhsfWb3K0s7czFYaDznTiTJK6YwE03SF
iu3eBGs4bAPD1H8aEUhnBIpBI4tHLYzzwBq.SzYvI5LcNHCN7JYvAtCShT9Y
RuGuGHfw7UfJToqfIF6h3IH+OvjEY630zw1NFRgTHB3rpigvIRp1Ky6nbp1Z
lz0H4FyzcxgHzCsP2cHBI+QW8m1djD5Lyw87FJth+veOV7ecXQXRfy1NQvDQ
LhIZJZzbBXzVbfQKWfshWsOM8EuWygM3gPpCYPTlt3.GvpH2zch+d3Ili0FR
bWAa+tU5u.q4r5h1bafsIkZnKN9nOdXKw2KBbflLvPjhcGNQNRrU4Hc.tUhW
v9Yu8i1n.fr8KMFs0cMo8SX+o868JJxxa4ELUDpy6DR2oAM31JI4guOsj.tP
RwVHod0YVRRVo3zR5R0otQVvmLlv7.YjMhVDRwCdS.IdXPBmGHw1HZxsZkDF
MLHAySzVQWchigMN6PW3XXk0LdInLDajbvRHYavLYIHP1.Y4I9r.TWKjbvsh
NYh041vrVK4Ars34YaQVH5jEYU0FIGsDtP1jSCBcQ3daXtNI4IvRWj4JHa1c
EzEIQv2HedjkqPfA1PL83wuk2zZ5rRDh5.9ZshhFsQ8XAS+nZu5hJ.9VQe+U
+CN3SaDU.vEa+uqQuE9mB0Uk5WUKH6rtBCeW.t0ls5+lMgynU5gJpAITVXhB
LBrrm1UxuD+xZQRqKqadrfI0cio3hSQTvRNqhicGT82TWl4slKfsAm5GMMMm
wOqufswxdDiSPvP4cQfjX.wL6Zi4FyZUiXnusJF8WtnONy4qsW.XU4PiA4KT
IBBGfRjZCNBEPT2ghHAPWqVW7Ulb2Zw9hxxS.Z0o4RLYlJ37OzPyJD5sorOC
9gmVSBE+EsYv6NSTlgg5GVTbrvxsYv690g0OJTB.lnV6wl0d4chlfuYTT1As
QFo9JjlVO1Terto2YTLzjS8uiWeBn8YhzNmWc46RuC6VE0UGaM6RslQDzJ.R
cm.yBc98xrDi4mq+eeOnSP.
-----------end_max5_patcher-----------

I tried to match to dB by offsetting, then by matching the full range from -144 dB to 0dB fs, to no avail. It is not linear, and has much wider range, but has an expo curve slightly shallower than dB conversion (a bit more linear but still log scale)

Anyways, the values you will get with loudness will be different, so that will change the distance. This will change the nearest neighbour and did in my case, but problably not as drastically as without any MFCC0 nor loudness, aka just spectral matching.