Ways to test the validity/usefulness/salience of your data

rodrigo.constanzo · March 6, 2022, 8:37pm

Fundamentally it will be “short sounds” in on the left, and “long sounds” out on the right.

But I think I was thinking about this incorrectly (and likely still am). I was trying this fork of the process as a way to semi-objectively be able to assess the efficacy of the PCA->UMAP pipeline (for this kind of material) by having the same pipeline/statistic, and importantly fits on each side.

BUT

I think what actually need to happen is to have the best set of descriptors/stats (as determined by PCA->UMAP) for the teeny/tiny samples as the input, and then the output being a different set of descriptors/stats which includes parameters which are not possible in the short time window (hence the reason for doing this in the first place). So the longer windows would likely have better data for some vanilla stuff, but should also include things which have richer morphological descriptors/stats. So perhaps having more derivatives (or perhaps no derivs at all on the short windows), perhaps having all frames of loudness-based descriptors rather than statistical summaries. Including custom temporal descriptors like time centroid, “timeness”, or stuff like what @balintlaczko was doing where the relationship between the attack portion and the sustain is a descriptor too.

So the idea would be to see if whatever best represents the shrimpy samples could be correlated/regressed with the actual richer, more meaningful, analysis that’s possible on the longer time scale.

In either case, the amount of dimensions on either end is quite variable as it’s going through PCA->UMAP to get there.

I’m also hoping to then have some kind of weighted comparison when querying for samples and such where (potentially) the shorter analysis window is weighted more heavily as it’s “real” and “measured”, whereas the longer window is taken into consideration, but only partially as it’s predictive/regressed. But one problem at a time.

///////////////////////

p.s. all of the main FluCoMa crew (@weefuzzy, @jamesbradbury, @tedmoore) have workshop specific suffixes still in their display names (e.g. “tedmoore Oslo Workshop”). I’m always surprised when I the email notification as that’s what shows up first in the text.

tedmoore · March 6, 2022, 10:25pm

I’d be curious to hear what your criteria are for this!

When training a neural network it can be useful to try to imagine what relationship the neural network might be able to learn. So in your case here, if the inputs are stats of the raw descriptors and the outputs are that plus the stats of the derivatives, that might be very hard for the neural network to learn to predict. The inputs will just have no sense of differences across time.

Maybe it could get a decent guess of some derivative based on the min and max (as in, if the range is very big, there must be some change happening), but my gut says that’s tenuous.

It might be possible if the dataset is very particular, such as, if in a dataset the high sounds are always static (low mean derivative) and the low sounds are always changing (higher mean derivative), then maybe a neural network could learn to just predict higher mean derivative when the pitch is lower and vice versa. The problem is that as soon as you try to put in other sounds, the neural network will give poor predictions.

As above. Trying to reason about what relationship a neural network might be able to learn may be a useful guide here.

If you have the longer analysis window, no need to include the short analysis window! (Since you were trying to predict the longer one anyway!?) However, what could be useful is to use the two in coordination as in, the attack portion and a longer portion as you mention above.

rodrigo.constanzo · March 6, 2022, 10:43pm

Well, wanted to look at numbers seeing if it was “correct”, and how often that was the case. Rather than just trying an arbitrary set of descriptors and going “hmm, I guess that’s slightly better?”. Mainly cuz each new analysis pipeline still takes me ages to set up, so I can’t easily pivot between radically different sets of descriptors/stats to compare a plotting.

In this case they be completely different descriptors/stats. What I’m thinking, at the moment, is running a fresh PCA/UMAP on each pool of descriptors. So it may be that the dimensions may have no overlap whatsoever between the two ends. In addition the longer analysis may include more initial dimensions too.

So the short window may have mean of loudness, std of MFCC3, pitch confidence (,etc…) and the long window may have mean of deriv of peak, time centroid, skewness of pitch(, etc…).

My original original idea was to just have a classifier rather than a regressor such that I would have an accurate analysis of the input sound, and then “some kind of idea” of what the rest of the sound may be based on the nearest match in the pre-analyzed corpus.

The whole reason for this is that I want to do it in realtime, and don’t have 100ms to wait around for the the longer analysis window. So the goal is to take a meager 256 sample analysis, and get some better info out of it than I can presently.

rodrigo.constanzo · May 15, 2022, 11:31am

Is there a shareable version of this?

Basically revisiting the PCA->UMAP part of my processing chain now, and it’s super handy to know the variance from the fittransform output, though it’s less clear how to arrive at something like >95% programmatically (without risking stack overflow).

Essentially I want to have all my data->PCA->UMAP “verticals” (i.e. “loudness”, “timbre”, etc…) move forward with 95% coverage from PCA without having me having to stop and massage the numdimensions for each corpus.

weefuzzy · May 15, 2022, 12:44pm

Not as it stands – obviously the current nightlies break all the outlets, and the patch was a huge mess anyway. But just getting the cumulative explained variance is simpler anyway. I’ve put a verbose patch at the bottom – no risk of overflow needed

I’m not sure what you mean here. What’s a ‘vertical’ for you? Also, I don’t quite get how you avoid massaging the number of dimensions – isn’t that the point of this exercise?

Code:


----------begin_max5_patcher----------
4202.3oc6cs0iiZjE94t+UTxZj1YF43Pcga4gUSTj1UZkVoUq12hhZUFWtax
fAGnnmtSTle6acAvfMfKaC1jj1iZ6AJLT024RcpScNG+a2e2rkIuvxlA9NvO
Bt6te696tScJ4Itq336lsg9RPDMScYyBR1rgEymMW2Fm8BWc9OBx3ogA7nWA
YaYzOGF+3bvWXfrmRxiVAVE9b3JFf+j3L+RNMksBjItj7HZJ3YZTNKCr7Uv6
iy2rjkB1lDFyy.eC.9AvSrT1bvxbN3wvmYwhaAkKuwwIoanQg+JS9fogO9DG
P+B804fP9eKSzJGDyXqXqJ6oQgwrfj7XU2kTbxsorLwngxCShen0qXcRLeMM
fIOmUw4D8xjbdDimU+rgqT.Qxxe9afVVyp88yD8RYaPqEkW7VJO3IA.7PJKf
qweLxcgksiiu3EDgbf9NyA1Pn36.PXqE9DL1R7xy0E6YQjssvB7S65SgwkcI
n7b+982Kea9ERX+eoz3r0BvF7ZRdJHS7V.Crhxo.dhfZDJ.aMUKYMXaZXbP3
VZDPb61lDKtgYsR.vGk.fag..Mk.360J960K9CwKbc8vDA9a6.wV9dyAdV9K
D+Ojq3Ex2BSHHA0.YKIJNtCI9ugkkQejc.9yqv+bBwyxBgss.BtAGKKW2dw1
lnDrETp56quN9qaYZrX1rpAVKHEgfkieOeW4GPnBMH9cgFnAjaLHeiPmAWnI
.vdYaDULtWITfjFRiCpvttYpPFy.Ql0832gXon+Xe032Q8Ax5ZHMlEjxXfsQ
B0ayLdrf6arXSTTOhdrXaM7ikX1WDciCYrAQfn1GDnVFDvt4UmeD9UXg7pCd
gsfs0Q9NBcMHWkblB8jBdxmRDyDJTZtNLdE38BYXavJ1ZZdDWMK2GZUZ1bVV
qdHyHMupqFIJHyX7PhAqiRD2ix9f3lEtQbnrutvpWprT4FU0OcNbT44saTkR
2v3rzGXwzkQMlQtUdhkz3G2wWH5qkcHXeSC.ccTHEzWxlTHOftBxCa07Eh+X
uLyTU3dNmmJ7BQBWszu6QFkn52HVZwXqXvIHfgQrmYoYB0s05l2MitcasSeW
suhDQ94D0Mxad0oBi0mBWcpT1ygkeeqpyRSECGtXrjmpFrydwoTcs71jrhkF
mGVQdTzlhtjhJDK3ix1VXSghXU17N.xSALPjRrgXqloi3R1gPBp7iQIAelsp
dmaVxVVbXb8YfZzbg79CMLKAsn012yrylMJGCpt+2KzvDUM.dLMbURrrSzfR
HOc4iSP7KTBTevnthX51V9xB9LArzQiYhAYd1RZpjPUHUhJajmjD0ropuWDa
Mun4sgww6gh7jsc2XpzX+tadYhnwM8cuUsj8Pdrt0GD7D7GxnO2Ds4znnBQ0
l29WnBUaTNiGpIAHqpF0ZldRLKcRTTiwqtkmaokUBd7.1WBWwepg1RYKhKOb
aISzrJp7pvGYY7lmiSeLq4Yx3upA8ZmJeYgL7Cb1Fg8S78t.gzQXFWLS0WxJ
tvRFs5.vtkLVWlttduFmu2IKapDLksMIkCfYh+peAsN62AK6pwWosYAs8Ty6
IrXSn.rUyc5PE+ckp4KTmLBC8.AigrmdpiZeCGzH7hI2XtfbKlySHBjIWAIL
6jA.Oy..ju8BmIGBrgl9YPjPlCHlQaSF3uWyTw2CEPRJvJ6C0+psYcn4fkiY
fETXkbcvptohiKXotqcyC.6Pzu92naSg5bTSzSvqGqUu28HtN5VRdqqHU08T
VxUt.3ABdZ5lfl7RuCdx3lqykAb1vFJTQ96u1pNbGv.gFMLktIX7qQfGSSx2
1Mjf5.Rb6GRlaNrTnxUezj.U3fkUK51bHAg6ERTKzxbXAggKLjWAdsPkHvxS
mQw6nLJMVBZ+nhOdgmqTgqu8w4VtR3xZoiFsOckwV8BLqiRjK2+XpjQZuKRf
Kjvh1+ESBYnmYurMUXyB+8uaMTLg86Vi9.3SYAzHZpbwmfyPM7PnEF5UZfiu
yjArDpgEK5nOHoCoKxkqE1Qqmw0cA1sUe4Ltnx4YIiGYH3EFIKYFR8N5g1oa
su8Li..sSNLAAfWJBryKJRay6Xswpdpr81gF89pUPjKrnEzrSuRrl+v3JeJ8
i6T1BffexHJyo1IjLiF0KjyEJuvQoWTdyOduXr6DPSIHiVuv0TnvYD6ER6AM
qWLlTDxTnSnfBSXKHiIWggHQIcaT5Dk78FIkNVJrJY6OZmPNMRGPQwIK2zfY
ROCu5AsWvefx4ogKy4Z0702EjSxasOFkrjFsmKVayYt2uqyMPAT.17MWBeQa
tjGTYA+w1BMzPtEZzfhHDfY9nDdQiRGrxdCuisQg+0aKzTHhNRIzliR7rda+
yda+yda+ytr8OqmUaKDoSO4UbhuPmm6hzNpQuKKWYuW0m+vyXbA6voiG9WFd
.s7qAHXuojuXNOWhiG.mwTXFflYYh4Q70mNhXlGwMyUmEBPd1lxu.uhaVPeN
FG2w1E3X11EXrCxUrNPqhntbxfOJiMOGMLH6AvA4ZtFj1zJc.J0Opfu491z8
B0rpYD9SruMslYB.f0Qs2eb8sI1TmahGSGVHUSYjqaJ6tiRuPZsgYNuw9fKb
S3JcpxTX3AzSukpt5keWcvXBfHS.P+wlLZDyzX5BtVnN2fdA1TuQp5tvwrWb
Jrziiq.MVvZLcUN1TuT6Y8WUGRtJeyVi8Vmq8Y5sNe8RNbzQli+v6SxtxAj+
gLp9kI9nRUM3KxT8.rIIUlTHzXcV0IHUgxbhjtQFXex3frJyQBy1koSCXtLZ
bpz4fOiToyhHm6wAVDGBZ2A6N4RpHmKOohHtdKPXWHB5W9RnX0UGzJWmjL5e
x3Z9qzDYzzJH+RFHdBmFsiMZW1xs7U.iF7Ta4p4GA2DNLqygCSYpCA5OZbXG
Ct2O+ke+5P9bYKw.oFs4BHmmRC3f0oIa.qBC3e3VjJr1dms7KBSNj2dTR90t
v5+UdlFrU4a7irXVpBezXprg+yO78fmXQxjmJhMGrNIEDFEkKyIb0EtMOcaR
Fq+rP97.VmSGXQdVJECk6VzULOhE23UgBPVtWPYf2AErpUYV71.5BIHtPByF
OUr8Et8fEf.Ac01ev0Q4gqVHFseEr4UwGFOQg8kOQgudxPe+gOgB2jGwCyhB
WU47eoquE2TwUUaq31qbAz+HF6d14c4NzPzKp1DkRFnc6ZWOIkLxoVB6Rzt7
CB8uBFPnCtS5xLUvc9Qf38OXrDA15hjHHEITND+GCdDD41yizp8W2.NFI2x2
JCE3ViD3iy5ftrnPvlnyWU2qmxTUj8twXUnHqyWEpLDdUQvqtZnPvWsAozrs
E4waoAetvNuuybR5EQQwNNZdX3UKCzSEcTFn6ki2BU8xmWDq014LBqHuKSgD
Fp2zjm4fSxGD9FPXaXSHtxlvNC8jVMCWuUfsYENTuIHH7T1pIrgLGCHV4oEZ
rb5GcFx0n7eyiAzXv+9e7C+f3SZzqYgYyAuFxhVI5hB1bPkc2h0ZqVHSgS6M
ckFn1WB2YhTtVKbbsfh+U7BKDBcWHlyzWr7D8KnbSlzSk.uJdsYo3eAQLZZ6
fBtEMOc6HvxMXt7S8MdDX2DpnOTxrvB+Ak2aYNmmDarBJH9rLLqYASoyQsPe
z9LKHhdobWiExRWo2zKwZX2oEWcFiwGh8URCtNCjfXXK7INvqrFbopGg01e0
jk62lAbtWQ04E.maq.m2vKf0Y.EuYcPPkZcysPnaFLIe5X.UVVGBU5BzVOH0
e8BJ4BKDb80glNQGRQ1uEWxuEWxuEWxiVbIy6KJBglTMe52XkNqTk5x5gPOX
wtwMchrxgNq5MBOTpjpUt2lHvwx70qYo0LLQNq6oWkJLHnJaMjS6PocuyD2c
UjUGLuV5xnC5JGsp8UpgDPx1bNnX4mkUsW4NlQAE1FBhnKYQCW0FpQ7dNnvr
dEG1vhxVJ75Vjh5Cly9k7vrmDXJBD7DULQTjLRqxnB0xLflSWV7HRDWAb2Uf
JthrhKoSZ.9hJOVi.Mf3fpQCHtSAZvmYrspsGdCiFexggq6ngXvhTBoNW6zn
NjwoelAxR1HKm5T9oWD1rGaLqrx0NgvrfbcvITt7PsXsXkXfrHY8ra3ziRFY
YXj.P0U6sIjhTMrp+QCPZPfp9.VT+4OhRxSFfwiE.CsKJ7x2j4oNRx8DJ+qa
Qcx4Tq8Jc5pxebMObzL5hTTH5cJrtcpTWwBOYi76OyotBfnMR6cMGqISk7S6
8RoDuXcsbV7WAepPKvt0KHllG7oZAVc6dF9TrlpqxRk+IlJjCOIhTr2eP28+
cWXBs.tds6pKrEOEVAmuR.vtvFMzzIcLqjCxXxBEbKhAJS25PNPKhTVocsNc
hC71y3WF9GNSPZhD5aSyjb+C9jlvrOcR7QJWtBvyo73YcyIG3xRMm80MYt6K
k+EOz0xc7TFgqiah+OrFI5tvQ7xEUtm7xJdo1ER9noC5pXXU36YAud2H30gr
.JgWb4Ni4Vgtd1SnBQPXeFi2UgH33VLN1Ve6AWfkubpEPIHrxk691szzTPu8
2B18az1IvAae6LOuUT1BoV1Nd5vE+wSGTI2NPsMMCxRThphC6naqLLWTw4xT
.hy+0PkSlrN4ByA1rkteUVytdeRzytYR0q3JAtOx3xLpU4M9rSGfMvprQ0xr
1zR.cbKyNto598cDeN20F94MgVuni0MYC+F+cUkLE.Y4OxjcN82TBzqVPnDr
+J.haYU5ohUDtlQkg4T19qTTzmCRXqWmAf3SlVAukNmRqPWWzyHjqLE47pGQ
93gJjJzdl3Oi0jHehY9M1C9G6ZRDwzxEBbLqFPPSqLRiYMKA5eJ0d9ad8jdT
KOSPSKwQVibm.dqqsPlxYRFaj.an3Ar2ZkUQ.ZYqqzA96NZrJDPVlVZlF0BD
kwBUiYoYBapTEbJTYlvi4usEXiqaXiZUp5jpgZvwrW.u00SNr2IpnqakLtXk
a8a534hB1o5Wgb0A25BK3wGGROH.OX+ItVCDho5rHio5Bhwy+MlLmlJm5OlS
h3aZ88C5+m8ZZ2VZLKp+D4cIM3yxxtc7p5ob0xjzU5LthzR5iUkg45KKHIRm
AUxpzAjXa46Yg8QPaGOG4OwBNtV1ddHrMxm.gdxS4J9OtNjxFZVYOZqDK.0o
EUKUSCa2hed4bOVdjISQq5Ccw5mKxbH2pzeY1xGaLbPya7VsNYU4LSm9L1SF
xiC5FQdbrbdi7bTxi8sR5g.sdi7bbxi6Mh7HCXt2HOGk7PtUjmhBY+jidntn
ltZTSd12toBhzA1KY2vzwtrXq1OKZGvHX7SBavSxtlYxm8SxynmDdHdRNF7j
1q6TwfCunGseCGu00hQa7qu29KcRtGUpfRWmo4vVNxQu82xrP2t3nKExLhga
HXC7M4I44LDrAdl9jfWMFN3gLbVWziFYDbBGCdcWaCdzCg7riIBUHqg5Icrg
DZHDCZ5V0t7I8P7jLZHM.OGjIbCCAQxHVdDoOcrPrxI91t00p17HhKolNV0Q
CtrqkobaWpBJrABPChx8qjFAnIpd7GhGjIFs.sGpgD5XLC8a4.REBsPauEt6
upkBijOrognaeLVK+g.fHlPyQtCAWrIOIxgiIsY+6UqijOj8pwQ6UeiNr1F0
ccMZ+ZZjxancUKit+2u++CjOMtMK
-----------end_max5_patcher-----------

rodrigo.constanzo · May 15, 2022, 12:49pm

That’s super helpful!

Also a lot smarter that way. I was initially thinking doing fittransform → [> 0.95] → [counter] → numdimensions and just looping that until it was true.

Not sure what the name for it would be, but at the moment I have independent processing chains for each macro-descriptor (e.g. “loudness”, “timbre”, etc…), so each has its own standardization/reduction, and each has a different (and arbitrary, while I’m tweaking parameters) amount of initial dimensions, so didn’t want to have to dive into massaging the numbers manually for each change I make upstream (i.e. type of descriptor, amount of stats/derivs, etc…).

rodrigo.constanzo · May 15, 2022, 12:56pm

Here’s a chain-able/abstracted version that plays nice with the nameless dataset paradigm.


----------begin_max5_patcher----------
3324.3oc6c01aipiE9ys+JrhFoc5nbyhs4Ey8CqlUqzt+A1uMZUEIwIk6P.V
vzo8N5N+1W+Bj.DHXH.M8tSklop17xwO93y43mis462e2h0QuPSW.9UvW.2c
22u+t6jEIJ3t7+9tEG7dYSfWp7xVDlcXMMYwRUU7+xOLfxj0AyKL1is4I+v8
OlP2vTOaWrwJik.HDZJ9sk7uPnUFf+yomTTFq3QgxKUUD60Xp54rXwRvh0dg
6Wb7F82Jkqn0+1ufftKNJCIdGnLZxizPu0Axa2PT0eb+8h+aotMW523O4hmJ
i9hrAsHF7rWhuW3F5+vKXSVfGyOJrufBAYKAECGIn.Q3qAUZAQrVTV.38bp1
Zdi8tE67CnOSSREx+Iw8tEdwwkJ9tR2h.g9sH4Chr7XQ9gphvGKJg9re8GK+
4lvgAFGCxRjR+hWrMWb5wDsklDl4KuGUg79pbQR1qDx6VSi81ntYQmWQ0m.V
KWEtlCqPIt535dBX4c66Ch17UpDpLJJLJlF5GFmPSogLUeZ4p2R24kEvdbWT
HK0+2kh.j2c0T86xEwFqTzHjx+emqDEbrErOweaTnPHpzUHJt308E.zR1prJ
2XjWQnWbC2LWKfiKsTYJuQlkt1KQzSkOJAUTIKJJnZUGuu.5NVd0w9gg0PQV
Tb6Ul3u+oKbuqi3Ud3ROaYMoOlEpp8QtRA6wTumqh1Luff7wtUe7u3E5eviQ
Y9pt.jwwJUVJdJcSRTPPk1qplmanlsbk7Mzu4uk8j7EUVYfe49wEJQKN1Ku0
eOMkUsLl29zpkjxdUA5kJJac9f3GYzCwbiN0t.9vC+TV5SQeKM+BKTzJC.mr
2WdPcYCgUJulAQksmiiXawbWal7vX4XRGnbDpxof3+OpJetQOiR0TxxlYYQX
SzgC7QrUPC40GtkJaTnhxjl+yMqLPDnhKgZtERn7gXaA7VvVt9Unv7W5PwJt
s6UVKA14tNMOymYCfU4G44dIpbikvRxh4AdXf.PvPgCULCPA.HMpKvlNPCzE
QikW.QrFSDoP6rIH4SfTVh+FVvqfzXp2W4s4kfuQA7AwYAaAa8e1eKEvdhWx
+MyKgtEjxuDd.GI7HPBxnof0uB9nJnLPbjO2wD3W.vG.7A+zkf0YLvd+mog7
GgGS7f4toO3Ev8lHdwdBiw.uu485RfO6ujxqkABozsbqtkj1.+P5lnL0.LyA
1+YBcVYYYa6x+g2KZCcs4i8cUw7v0wcMwXC9ODGGLwvjGjHTa6BM4vsVOJzv
XQs63nKbtl0L0c+uS7BS2w6B.uFkk.R4+2FJXqGyCvh38Q9of79xncf3D+vM
9wdA.9ij69l+PSasaAOvtEGnsxJCYkiMh3PLx+AJFdIi0v1YPcDvl6HbIs1O
Pls9gMYGjAs+LEPeg6IkijaOFR+hqCJMIvygRnsziGZP95bMmKyQ7PVnTPb.
2LvfQA9XYgoY2lPg7I9MLT.+dvMUtWaK9jN3+hXOotobgykhQwXCtMJ9ridJ
h6ahavZGO.KvG41ts.4SuQ524gVsRgFZbiHWoUJK0T5TpQX7fTiLFSLaWPD+
spkxBuX+C7qsxzDZSKxVp3XYQZltjgnEUg1jyBArhU4FYOYYEHnno.WU09OO
5BYavdlh3VoCpBzusdAjdCYcQhe4LgwYauntjbhPlp.PqDyzF4L0t8lHooEh
ZZmrltIroMRapPbiFj2Ti.GRNyMRellVNxeIXIqLx1LGNZviiNb4zIeNZxoy
E30QOtc5femN43oSdd5fqmt46oSNezf2Gc39oO7+bANf5jGnKyEzk4C5xbBc
QdgZianl4GpENhzhmHM4JpY9hpamotM7ypuyfLpZPOgFGkv.vT9+peQs4ZsU
C6J2mXdr2PjUqged4XGNahjmh4ph6spt3lZXZCWwSzNGIDBgWMZ.j6s.9jqF
wiKfOTLULaZX5HAVHWqU1iFZQtEPqCdIeEDvsG.3d5OjB9akBy+iPN7k.LRe
n9s2Vj8WC3B4yXJGbQMEie+.W64.bkMyKpaYnQy2VRDloQ67he4vP0HTzyrj
UWnagF8VnReBgTdThod6ospu9gK6Z.0MdKHWrjmA0zKqGz+ni3NyhB4YSWpJ
386Af8IQYwiDDl65P8WZffnNQvkcghN2.nHCrtBIQCxnWNngUIyPWcvNQP4L
86BEQ3aBTL.rdbPQW7Jhivwgq8XpIdFqImoMRtAvwc.AIbW6HZDhTjpQARpH
aaLLJtKHRvJzk7EYbCfhOSeINgGyH6ieXGjGDzG1gd.74zMdAdIBRN.ijaGH
oHBRs0UuR+N2Ftc3SL85gPakwRGmUXoRpwr42w7cSzjJ0rYMXRh4fhlDNCP5
YKYkqvKiohhNsA0KMyOqQGwpxZnX9YcvcircJttKCmp7TmqFjGH6Rv4s8szT
le3QBW+xoFp3hGTW8PkPtCE8kPG90BmYAzoGxGOVs4GAkuTnlRn4af.Z1mdX
m2JD7lVBc5iVXwf94UBI8PKrXP+7Kg8wRybKfjdMPg7VXrtGBnrO9xFqKUY4
jPtPjknsOpxV1idLVh+5LlxEYSYfcPYpYePzZufZoTosj4b+4MfoNc4daxW6
SzqbEtXpRTNYBSWN7moKWqzkWY+Nfbk8Hll1+Lc4+Lc4+Lc4uuSWdGjpvszj
bsSr0Qsh5MUI2zZVHjBOKIQuqzHkRYb0sqkPJngaI.DSlG.z8OSYRRECgRQb
FSjD17FIQR6FowvDqdoBpchj5l+dL9VImbiP9jT6zHC4urwZBlXsyJmNYUBc
KjoX4bE5xBIVWsSjhOZ0p4bdxpDx58Ck8FyNk8Nt+YmxdENpxi4nPYuw6WJ6
w8gEHxaBQZt8gR72BxHw8gPWj0akDB6iDN248n.VzRBwuE4U.Y+tPBQ5Jgtu
E5gtCXjRsK9f+V0NZNedVPhZARICff.O9GuEfutCwvv2BvuO7oi6N2h+eNe5
mSivoPj2lcHdv66LrbE+A4Sw.gcfHnawOhMF+jQqtyrc3F7OEaMOwYWfbXL3
ah8FJ3PThXWj5Ep1B7bkBewwZf2AwJIWrP8OtUS8SOs4rG8ii.rqbtxVl4Dd
I+EYT2o613YZmtOw6kYrsw4JnPGnbQBNY6sY6YauM+unLkVZRjXSivMOITCY
QLufSJimNl.V+Jf5s4olNdF9DXBzSyOpbfSmdpwMvIxPQmP8i2jOtymsTTSH
PXqcIuifk3sgA1kDc.r0eC6gQ+LwH21PiFlmjyDCKxaukhcAY9aWEuw6GW6A
Vj53rydT1z5seB8LplGNjEv7SC72VIcJ86buvPt8wsLrWgpetWXpXQDBcqgF
o7mcQVofiEJUn6fbZF5vN8Yu9yEQNHvQK06SwMZ4ftmV0R0Zx1acpbMY+I.+
+e3Z8mgTqBaHdxhwBabapcl2zuszGQlu6zGE5h+Uw9Cn0sGvPNAJLyOPPy4K
eJTLQvY5PzQtb+OLT.AaSj6pr7yGLS7j5KAYLSfhHbkUYgwda9Zd7M+5UZKC
QJNXcmNUlYBbR3hDEbUSnNGSLkiercmTklwM9iKPw.O.XvGPK0GYt3xqpwUv
fEpUhHbT6PI7DnX41ZPu3azfdGJxhTKMOBRCrb3Zj3YBOqmuyVy0Y64jyFo8
4+5vzsfF857gENhvy0cB4laCaDNgbgvqFBNcBSeVhIOgJMwRcIr47ki9YoW4
x6ojx8+IaUq.X8Tm6qjYA0Txf0X0eTkhy24RsHEVSnPXoqLLkHA4lPHzUckz
t5Jbx5ifZHXtSH53B0THvNSnTfzcfKZ1so3n6HI7DhOXM6jlxwQt2.3.RWSq
HX0UFw3JE5hDFSoWFcQhoTsT2tC6oTwzF1GkhIUJzYLJx3VvPJAN2d6H5FM.
wdBG65B6iTLYwjnqT31tJadgEKnitVLGeeHeVSt7h2n9B2PIbiyGnIA8iA9a
n.TieUlPc+UYp3SUk03+QYxPQo13zRUTXjcvK9GfOW4COB.C9b.0KILwiQAF
qPfOK1uPRcjT9DwAe9f37QLUrIXrj2aH0e+Sq4ZaofiSftwTp16uwUHC0h2x
Qwmq6HBmv7CF9dBmBU7L+d+M6pfuUaSEwN1cnd.as8zbaw7heAyxqSH6EeM1
J8glY.nPUhGOoUsmpH5pmCdbrUqOAN5fIl1PBARbrbcgDMFI0WnZ7GEkDsNK
kIxnC8GC9q4FxYEwvwwvwjPrv1DrkrzQ2LBzwcL6w24yUpJ95jb7.VrulMcI
pUm4H2WSbFw9ZUhFFh5cQizzr4DLb88oDxnqTK95xjRY8VgNerrqgzRGwXza
rVVkaqxqnJOipld8fixAfFhcsBgHsFDeI5PNCl626xnq2kQko5UKB4g+xq8X
aYt7t3QngBqtoLZETsFi2kq470tb0ACsFilkkNpkPBYLZUUmjTquKmQ4c4ny
6x47gapQ50NTNDulZGBG0N7MN+P2n8Cai5GxFxY4z1gpw8+w8+OvN1TCN
-----------end_max5_patcher-----------

Have to unfortunately zl slice 2 at the end of it, but that’s a different issue.

weefuzzy · May 15, 2022, 1:08pm

Be aware that by not naming the fluid.pca~ instances and instead fitting twice (as far as I can work out), you’re potentially doing much more computation than you need to: fitting is much more costly than transforming for PCA.

I think I’d take some more convincing that needing to slice here is as bad as all that. I’m certainly not on board with the idea of only reporting the explained variance for fit and not fittransform.

rodrigo.constanzo · May 15, 2022, 1:31pm

In this case this is only happening on the corpus-creation side, so cpu usage isn’t that big of a concern. Keeping them unnamed just means I don’t have to #0/— if I want to abstract it later, so the cpu/time is definitely worth the tradeoff.

At the moment fit doesn’t report it at all, it instead sends a bang, ala oldschool interface. So there’s not really parity between fit and fittransform already.

The issue is more that fluid.pca~ uniquely, and specifically breaks the ability to just being able to chain processors.

Screenshot 2022-05-15 at 2.26.50 pm

Obviously this is an absurd processing chain, but there would be no indication that this one object/process needs to be treated differently.

Also, it would keep with the paradigm that the ‘left outlet is for chaining’ and the ‘right outlet is for other stuff’. Amount of variance seems to me to be squarely in the “other stuff” category.

weefuzzy · May 15, 2022, 2:06pm

At the moment fit doesn’t report it at all, it instead sends a bang, ala oldschool interface. So there’s not really parity between fit and fittransform already.

So it doesn’t. Which turns out to be because fit doesn’t ‘know’ (or care) about the number of dimensions. Fitting PCA is always a N->N mapping. Which is, IMO, all the more reason not to remove it from [fit]transform or, more generally, sacrifice the idea that messages might return some value as well as filling some output container .

rodrigo.constanzo · May 15, 2022, 4:30pm

It would also bring into parity with fluid.mlpregressor~ which uses the fit message as a way to query how well it fit.

More than anything, breaking the nameless chaining with fittransform is a bummer since there’s no way to know that this object has a different/specific implementation when processing things as you can literally just chain any of the other objects.

rodrigo.constanzo · May 15, 2022, 11:09pm

I was gonna post this as a separate thread thinking there may be a bug, but it’s possible there’s something in your subpatch that doesn’t like different preprocessing?

Basically if I go:
loudness -> *standardize* -> PCA -> normalize
Everything is happy pappy.

However, if I instead go:
loudness -> *robustscale* -> PCA -> normalize
I get some fucked looking data (basically it tells me 1d is enough to get 100% coverage, and that dimension consists of almost exclusively 0).


----------begin_max5_patcher----------
6882.3oc6cs0iihjr94t+Ujxpk1taUiGxK.IyCyoOZk187CXeazQsv1oqhow
fW.WUWync9sexKfKvlKIPBlpO0HMU0kACQ9EQFQjQjYD+46e2pMwemktB7Kf
eC7t28mu+cuS9QhO3c4+86Vcv+6aC8Sk21pswGNvhxVcm5ZYrumI+b7l0f3n
vm4+fA1Evumzf3n0qA9Q6.GhSy3W5OXIwrz+qhuaXPDaa7oH4C.m+gGSXo7m
ueF+a+0ZuinSGBhBYYR5Al+g6iixh7Ovjzx+CK7QVVvV+h2zQ+rsODDc+WSX
ayTiVpEZs0c.GWn3WPJU9WNqs.+uu7dhOkU7hrJ8h16ukU9kGrS9Zi276+Dw
dUoaLM3Oj2HxZsk3S+Ou+8heb2XwZ+0fiIAubEy.ItVRPv0QBITGwOQiGQHy
.hflFDA6ZuFa45ZYSrPXWBxgZd7AOG3Ce1Y1CAofTF6PJHJN4fe3Omv7Sii7
2DxpcJIpyojDih11zqmQRFO9hlA7Exw2sgL+DPvdPDisisqUH0L.FQAUXhqD
wrrMDhAmCDSXYvO74+fA7AaNEs8AP7dPB2ZQ7Av9DNLjx+E+e+6BqJy.XhfN
qsuSNLESqMDVhcMHVFwdh+LuBJ+iPPZXvVF.Uq1OTKy37jJwbnXwurkiYDpw
wbwSR8QYOejodLqVcG++O+kJM7QVqL23be3ofcqU5t3.4e0pt9lGstH43zCa
9QKbHi18gw7WbeGLtDhjmwGSTgEJWBkZi4bRtLrs0XFZa7itu9gG5koOhInY
rjuxTlOpLgfyejDoybCFExwSzvGO8CeNctgkz2guiq0LL7IsN7M2z7ifG8SB
7i1x969gaOEJ86XnZ1HRHA531AlfacxdiS3sqXPgy4Ti17gKWZHHj8HKQrTn
R7v2sx+3wRe76J8UDXzuGKePz6N+QAQpOBe9iRXOFb4ik+bS3.QFGENknr78
cmB1l3wDuikDcJP9cTeHmakSRR9hvhY5wbyYR1WwkKKsgK4fFDAk+oqm2KfK
myeeX71uw1UZpAGbOxhBhJ6NYkKuis2+TX1WKaoDJVkVMWuvjasW7rc++atb
T34gv8IA6hiDDQEdg3iKdc+F.JMNH94KCF4cD4erluLWJfCLMbwT9f7T5F+D
AqJehBp3hYwwgUuz4uWHaeV9kOFDEcAJlEer4KlDb+Cs7c2Dyu3g1d1xqj90
SQpq9UtTQ1WS8erJZm4GFlO8s5i+69QAb8ebmsNj6oy4KpTV7P51j3vvJiW0
UdrlqriKkuk8TvtrGjunxBC7aO3XgPzpyb4cA2yRyp9YY92mV8SRydVA5k9n
SaxmE+0L1gib8NWbC74GAoYoOD+TZ9MVHnUF.dIbJkmUWVWXkOuMchU0KtKX
aV4KTmpvlTGBwRCjHpUtEB6pZCqQiHozUtRqnfV3Huexym0OVVG4k5Ig1kIb
gzztupl47U+rrjfMmxTv16tBd3RGbShUjR6xF7k2vAt11J1ox0789K+WJSWC
gsIYDMxbrZm43XIsUAUJRwVE+rElCrMlSyrAmxjXwJyJKiK+BQ6XxwLxf.D2
rRp+8rZEr+.dnh0HO4Z1bsHqccPTWpU9+A4h2dZHjOPbjrxfPi581HD.0CBH
VZK7X0vnBsHkNnHLeANNPLzE4hgdblMwihb7rHVPBkPWOToGhxVOgJiDLBit
zGQiIv3tbjWv1tFRdgzKwErAQfVLRxctj6E1N.eDbNIDoCdtEW.giUNJTBRr
lLAD5p4Adx.gfvgBGpEVBQnb+90P+JpUznMWFrMIhTMHjUgjOCR4dfrUjepz
iL+uwGy2AdhA394cJbGXWviA6XfrG3ex+9jeBaGHkeK7kklvWmZ3IVJXyyfO
pV3N3Xb.esKfeB.+Df6eH6N.20Fv8AOxh3OB+LwC9b.rDuXeg+5.+m7e9NPP
1eSDZ9rpAP953dRFH+i.4S8scb73+GmK5.8b3y8yiCFWF2ifwBimTWWtJWB2
OQn15EpaMYWZewxZ0EeiyqxKOdny.69ek3GkJBRC343SIfT9O1x.67y7AYwp
bjjyKi2KSoz1fi9g.9ijuBO9CMsQ1BdfrEWUfmcvzZbhQEqRQ3KF.i.VOivi
1HefNa7gsmNHCsyiL.667EawQxcmC7ypwAkDJ7ZnD53IEzGjsNOxboNhupVF
CbLjqFXvn.Vs9Nu5Pg7nCNLT.+ZvLUtUaa9xorkoocJMS4AmKAih4FbcTILt
wItsItBq8bGr.ejq61FjGALocmO0nVJzP8aTsdJGahJ9XJGGwCRLxxjXVkLF
z9vh+wAG32akHI0jTjiTvw1FUeL0GhTTkXqekKfUzJ2Q3MDPPwPAttp9+J4e
XF73VICpbzenqiOeJKMe47Sme1NqtjRdIn8WF3oFBdeSAv+hudcAxugf42b.
86Nn9MEX+JA2Wi.7eQP9opo4HoMShsL8IDWRUVR8g4WiP8qS396Lj+ZF1+VB
8udg+uiT.zYZ.5LU.cjNftSIPmoEPiTCnS5A5SJBZIMAclpf1SWP6oLn8zFz
ZpCZJ8A0mBgFRifVoRPyzITeJEtTOyk5vu55c5jQUE5IriwIY.XJ++u7lZxz
ZiJ1U6oKL22aHxtQ2Oa22gqVH4K9bUMP9ULwM0vzVtfGaaloPHDdsw.Huk.9
jKFw8KfOULUrZZXpg.Kjm8ZGigVzk.ZcvO4afPt9..2R+gTvuVxM+OB4vWBv
J8SW90axy9w.tPtSX4fKpNe76G35LGf6kYeq0Lv0rC+x.g0Vbwa2MTMbE8JM
YWRzMDF8Fx7xDBoWmQlKxYW6lFPci21vJVFpOccFGwcmEAxqVtzUaXy6ShOc
zPPXtoC0eoABh5DAuqKTzcAfhYfMUBRzfT5kCZXYxLHHMkA6DAkqzuKTDgWD
nXHXiYPQO7ZpqvvgZu2YJIwqhZxURizE.NtGHBB2XmQiPzhTMJPRUv1LgRw8
gwhnB0lsHqE.J9H66GS39Ll8wOrGxcB5C6QeB7kzs9g9Ihfb.LjYGHsvCRsk
UGocmkgYG9BSGOD5RjpIccWicsaJ1nShcGxqFuIUhYypyjTxf7lDNCP5UaYk
QXkIeW6nMn11J+rMNhUMpgh0m0QrajiSw80NbpxSctXPA2Fb8XeGKMKH5b.W
+syVHk27fX0CkBKdo5Qg2JBDpIAxMOdCnP29.gN2BJzoGTnX917Sg8f.EGwm
Ym.60D4aw7jBTYgSg5NSlbCHvBsGZqKDN2DXOnO4focBrzEKmDx9t0+GTlZt
OLdie3EoTooj4LMGGfVSWt+1789Daj6vEhJQ4zILc4v2RWtVoK21ScB7UqW2
SxQHkO83ukt72RW9aoK+0X5x6HnJbMMIicgstpcTOQkbS6YIfT3YII5ckFoT
VFWbarAjBZ4UB.wz4A.89QJSRJeHTBhyXhjvjERhj1an4vT6dIBpchj5N98X
7RImbFHeRpSZjkprVf0DLwZmUNcxpDZIjoX4ZE5RCIVWoSjJdz1tyWVkP1ud
BYu0rGxdWuezCYuBGU4wzHgr250aH6Eba8CE4sHZt3dQg10dyGB1oNEj49lA
opMUgToCEd9Ol6gVeByJ8VDreLd.f+7Sgv9PgycXLw8INl3aQz9kyqQ5Rgd2
JJTaLjbqnPckCKDIlWJz1rX3+OOd5sUeP1c5vwAetyvxccJjuDCD1EhfdE+m
3fwOckCjYq3F7ODGMOQsKPZRF7j3rgBNDmHNEo9Qpi.OWnHPTVC7OH1I4hMp
+4iZZP5KGNaiWNBvdx0JaSxC3k7WTidR2cvyzIcehOKyXGqqEPgtP4lDbxNa
yNy1Ya9exxTRoIwhCMBW8jPLLKNyO7EgwWJS.adFv729PckmgOCl.4TO05Vf
Smbp0BnhLTvDtr7l7w8AY2ItRDPnq8NNiHKwealp7LKpDbex30DibcC0pXdR
pIF1zaulBU8M93V++ZrErHWU4v0HGZ8lqPOFU8vgSgYAogA6pjNk9U2Krj6e
eaKm0nKq6EDUTDgPuKPiT9ytHqTPSgRmqy2z5gtyUE77gR6m0eNIxAANZode
pXiVdAzSqXoZOY6uIUtmr+Lf+yOMV6YHXoJd3j3iE1ZYJclOzWVxiHxqN4Qg
r3OKNe.Md7.FREnffTkhS2oqV.hfyTQzQtc+OLT.A6Pkmpr75CFAOo1RPVyD
nHbWY8oni9a+Vt+M+xH0kkWScgP3zIxLSfSBmjXfQsf5bLgnZtRdSpPiY8+n
kPLvc.F7Azc5iLst8ppcGLXiZLPDtpSnDdBDr7ZzoW7B0o2ghrH0VyqnTuNQ
Rj3YBOGU4lVkSNGzTWros5U8gEtbpnxJcXlnhJCGMD7RSH3pDS9BpTWTpKgM
cuYuaZideUTrWEmrSsCf0ie0aJyQSRi5TM6RFkJ7f8gJrlRp.pAUfcmRpPSn
.Ogz.ktDjJnZKaN2yZzlxHMSYVSCko6LIO3TJ+nIQXOkDgq8BXljit7CDbpo
BczrgrlPpvVayfy9zF8IM3bqqwVSVm2Tx4VBymwVZRENSJUnI6XJoAj1tKM6
RqZSZnYeNNRW4G2qXwKEJCAalzlH9ot1PmRCG0XOngIcnojJzkKYOkDgtRJ8
a1UN0VrUm5ZaN8mComv091Z5xszjh3LYerMIdyozLQtKFbmr0QETKnEx7sx1
hBksnNmpN+s+lrRiqR2S0b9zSXoZnVK26dZpAtiZq2mRJ0z5wcgDvFQhZQA5
JCN9RX6YIfOLztzb9Pry16ZOGhdFuCMKZlJorrgJUSsv0er1GsPM1xjry8Ab
EQEMSly0Cy9xUscTCZaixUwHCxUUoERphbfiQUUs65zAMdNJwjbzsgL+j9NF
IpVqIVUB.vlkOBKxPypM2WRQr0ZrEzyAdWs+qRcsohuFpueu6S72EjGsZ34m
09fvP4y6qEDdt4gZtC4Og8mdu3IblxQ2U5Gs9UZ1XUMeC+n6yKu.tmO26Utg
W1ShpyFuccOjSYwkQr71YkgDKumkMjIe4Yjln1Hy1FVvzcHS73dsk0Pu7tE6
BPpZpkJA6jhe12wQic1bn8L0YyO8GAby2V8lQpF+pp8BYvcx7hS+ZwuENaUK
bXRaGaNsm6zyeAXe2+vwPF32i4y2ddsev9d6cfC5bipi+CWWhnQf5YYSDm7e
xPMsnNBqs036wlzTpezt3C.HhXMX4f740d3gNgtQFuAGnhMeTXvVFn2d6hxy
OMEqkZq96y.zhXbWdO2SG6sSu4CWW0lZtSl5PFtCR.tRa0R6QC1S5omvuOJe
JpkKgRswbdosFKbo0wViSO8fsp8N+ZWzVrlSvnPRdhF93oe3qZHl8d3qLUOw
C+4x18wyGDk+te3VYqqrAOYzP2lZ2TBcFrs7hpXQ8XhypxjvKUrsh3j0PkZq
tpzVQv+po5rUSkYq9pxV6Ujs5pFamOiicTE1VUNNPJLkRy6Nwx+z0q79qt15
uVG0dstp6ZsVy0zndq0PsVq65rVK0XsVquZsVa0Zotp0dMUq05oVG0RstpiZ
5VC0Zn9o0ZsSq45lVy0LslqWZ8KBzWUizpo9n0YsQSi5h100DMCugpG5tEFp
BcR9Nn11RiN0Jos8hnfV3Huexym0O159Gtb4+tuGkZF2j30U6uNN5DkugCbs
sUrSY9iM8n1lnNVNkNiNS4FE0oW6RRzLsou+.dvaBdu71XJolF5smFB4CDGM
ZOpdj6g1bHv.6gVzhT5.454hVOTIj7dX.Qsczgno6zF4tbjIv1tFRlfzKQB7
LUXmSXbOs1A3ifcb2LhDtAmN1CxqBkfjIrDO+ZnktmWO.Q4SWfSZKce1JNGe
Fjx8xXaV3yfziL+uwGy2AdhA39xcJbGXWviA6Xpij++9junDcb4QyeyyfOpV
btp.ejB9I.7SxN.+c.t6Kf6CdjEIp0GYhG74nTIdw9BexA9O4+7cffr+VJ+p
YfHFa2DTpOHP9TeaGGwAZhyEcfdNmOYgx.5RvXgARpqKlZQTkfzAb39axFhU
y0WAqYq9J7uNmrXYYWQsWZ.hjkCxh47nfTPNuLdeck1hTiWkEbgNJsLzZbTY
JpxBdzEPctHu56ymYTpjhTDbmUiCJIT30PILOw2CxVmGYtTGwW4JiANFxUCL
XT.qVCmWcnP88pWMQA7pWOUqGa9Rl3+h5Lolo7lspyy4JvyKsjXtBq8hRK0G
EM8PPdTtj1cZtVvfFpeip0L4nJhn4hQX7fDiL5wLuRVAZeXw+3fC768xpqe8
MhXofisp8Kaajh3xUU63Jt.R6S0efCAECE35p5+qjigYviakLnxQ+wcf0so4
KYe57y14sVohVsREpZZNRZyjnpU0DWxasRk2ZkJu0JUJyVe80JUt1IiK2fzh
cZG.ll0QSpUiJONxlVo+x2f6ms66PMGn+Ynzq2ELskK3w1lYJDJu8waD.xaI
fO4hQb+B3SESEqlFlZHvB4Uz2nMAZQWBn0A+juAB45C.bK8GRA+ZI27+HjCe
I.qzOc4WuIO6GC3B4NgkCtn57wuefqyqlFOARUK3l0NOAbPsQAzBnkQ8AC0B
3OaYn9TxYbD204Go9EErroiYrgQ45tT5xQloalivPUcGTSYPsaXTs2UdVF8J
JSzonJ1KzTQovFpNuQlRRrydDkKcAfi6AhfvM1YzH0YzRjpQARZSludDEzZA
fhppJZPTlrJ29qhJK5mZrzhNBclhNDmxIGskUGocmkgYG9BSGOD5RjpIccWi
csaJ1nShcGxqm1XFb1cljR9QuMlovTUH5LRaLy90aaLyvsJ9InW1a0CJz8FP
fn9zGstIMXo9zJxbcuATHYoSft8oYtcKZDYt8oMeQuEsaN29fgt2h9knKsGS
jgV2JJrO5BmaBj1qYxz25TZiHc4xtI7IYNgF2NbgnRTNcBSWN7szkqU5xUkc
DnpRPg7T0H.hyaoK+szk+V5xecmt7NBpBWSSxXWXqpO2.IpjaZOKAjBOKIQu
qzHkxx3haiMfTPKuR.HlNO.n2ORYRR4CgRPbFSjDlrPRjzdCMGVUA4zVDT6D
I0c76w3kRN4LP9jTmzHKUcZ.qIXh0Nqb5jUIzRHSwx0JzkFRrtRmHU7nscmu
rJgre8DxdqYOj8td+nGxdENpxioQBYu0q2P1a51Q+DDwYmdDIML7VQgKeLDo
KE5cCnPbexcD9VDUbbeRMCx9Fj7HbexMC5VD2dbeBoK8VDUbr2.vvKt4CA6T
m45hpsJUsEtjt3Pgm+i4Nf+8Zn47V7zaOd5sUCPFUiRE60XC+D6MckCjYq3F
7ODGMOQsKPNIA7j3rgBNDmHNEo9Qpi.OWnHPTVC7OH1I4hMp+4iZZP5KGNai
WNBvdx0JaSxC3k7WTidR2cvyzIcehOKyXGqqEPgtP4lDbxNayNy1Ya9exxTR
omqx2Bwvr3L+vWDFeoLAr4Y.ye6C0UdF9LXBjSUE9ZK3zImZs.pHCELgKKuI
ebeP1chqDIa8x2wYDYI9ay.6ShO.DU6sOY7ZhQttgZULOI0DCa5sWSQOaBys
TvhTciEGqIsweaa1N+8ovrfzvfcURmR+p6EVx8uuskyZzk08BhJJhPn2EnQJ
+YWjUJnoPoBYGDsgNTsaeNq+bRjCBbzpnILTtq+L8hkp8js+lT4dx9y.9O+z
XsmgfkppgShOVXqkozY9PeYIOhHu5jGExh+r37Az3wCXHUfBBRUtMcmtZAHB
NSEQG418+vPADrCUdpxxqOXD7jZKAYMSfhvck0mhN5u8a4927KiTWVdcyMue
PLMhLyD3nZwTiZA04XBAUeqlxnBMl0+iVBw.2AXvGP2oOxz51qp1cvfMpw.Q
3pNgR3IPvxqQmdwKTmdGJxhTaMOJRCrb3Rj3YBOGUIkNu6Gfl5BJsUupOrvk
SUSVoCyDUMY3ngfWZz.WkXxEW+h8Gkto6hnwBuH5wxK4dP8xswcuf614K2dD
+UY7soIcSn3NVWYJmojJr0UG3TSD5vN7lRZPWoU3rKspOoM6ywcfZx5PSokC
GXerbLUTgqtykvSHQPWBymo5xO7lTpvQSpX1m0nMkQmaUMTckenNU2GTFkJ7
V.Sj7zUyF1cpoBs4GS1hTbVBREZaEzteyZxI2hs5TWayo+bH88s12VSWtklT
DmIaVsoY9Q67S1Mh1UqiJnVPqIne05btkA2Q2b2fsXcNjjjsWj7IvGf2ATGW
ig0clUmwgZRv335253A0zhaZ7J1+JY8cDhIv0hFuskMwBgcIHGpHFw4kBPyN
bQTSNbSX6YIfOLzdQc9XryVXaeGiXi2GpEcSlTV1PmVSUymoVleVcdd1LDCc
e.WUbQ6z4bEAsu7UaGYBNn1Fku5YYxFFuLwXRiDCbLppqeWmPrQyRoNCZbNv
dIs5HRo1ZrSTujl3LS8RZEScyo86a.IZSIrZWbf8H02vJFMaM+zEantlsHa2
96T6TdPPaiYbmlVy49PjYMshoFb.KNFpuTbX0V9NeDhyqlGWeNn0swfWbPSu
pLvVYD6XRcwV8VDVo1Mevdc4WdbbSSZTUtaRaPd0tMUu1kli5zEyjzz.T5i1
KMxXweV6PFZb+HDySC4K2gE8WCUTF5nbVDQ8ndtdPp6c.OyqtB4tXbhpPME0
cpbhhLELZQEuI8u.eQz3RYIAOlBfCbsOHWxZLwARbQdTJ2GdGGwNKgXdf.il
BjHL9ztHtVNNXj2cFYe2+vwPlDb19.2gS.D7kmBh1E+jXCVvmo6.9xCwGk+g
CA7ku8DST8d3vCvB7krjSLQm8T7u4OA4BLSUeoibqhh657ZRpa2q2WQOD2+c
OjXZGwhP773qTS0M9DhkdlbVWY3WdCU2k.JtwkA0ImmzwQqUiix3U7cceSXj
NuIGC7ljoeA1wap5Ik7hHMO3Wc0CrnF4fY3uIjtLN3XeS1Z.mkqili6Mg55M
QLwaxRGzCZh2DTm2jmIlfQz3MgQlXLQ04MYDobu45MoCaBhMgrGTmYtlPzCp
C3AaPOHb5e0UON8lRELzsGi5wyH6RwHz0DuIcLq.MxTaslFXh2DTK8UF4MUs
lZ2H7YD0HZoazPuK8boxctbopZ5zMkpDjNymwlvuQrN1.vUp6zlZPp0ql3LE
5Kw5L4PRenwhuN53WtIr7gczkSZj2TWtrhMh6cz4xtF1SWzarRDTc3SdFQWo
kd1.vF4coiksyTzndWd5H+4Yap2TWyd8LxXRGkgdNFXosdNZwnLhyhNHcGUi
F+zYwsmomqhlzEkscwa4hxz9Ekm8qKK6MWN1urLrK2sOMU10e++48+e.ANk7
w
-----------end_max5_patcher-----------

weefuzzy · May 15, 2022, 11:55pm

Something* about that robust scaled data makes PCA explode as a process. Exactly the same happens if I process it using sci-kit learn’s PCA in Python: the first singular value is absurdly huge (~4e16) and this swamps everything.

[*] Something turns out to be an extreme value of -1.1121164977963008e+16, which would explain it

rodrigo.constanzo · May 16, 2022, 9:00am

Most peculiar indeed!

I initially thought that my input data was super outlier-y and when removing all that the algorithm was like “most of your signal is zero, bro”, but this random/jongly example shouldn’t produce weird outliers like that as it’s random(ish) selections.

Makes me a bit dubious of robustscale as now I gotta double-check what I get further down the line.

//////////////////////////////////////////////////

On a whim I decided to try leaving out the true-peak channel for the loudness calculation ([fluid.bufstats~ @numchans 1 @numderivs 1] instead in the above example) and it’s not blowing up PCA. In fact, if I do only true-peak stats, it blows up.

This is what print on the first dataset looks like if I do @numchans 1 @startchan 1:

rows: 118 cols: 14
0     -11.26    1.2352   -1.3107       ...   -1.2649         0    3.5794
1    -24.011   0.18792  -0.28868       ...         0         0   0.37973
2    -24.788    0.5882    -1.735       ...   -1.7396         0   0.37447
       ...
115    -28.333    1.4876   -1.7817       ...    -4.387         0     0.623
116    -12.539    2.2324   -1.7566       ...   -5.3919         0     1.509
117    -29.661   0.69361  -0.93246       ...   -1.5607         0   0.14637

I notice a column of zeros there at the end, which I would guess is the first derivative of the median? That seems (statistically) unlikely?

If I chuck a zl nth 13 on the output of fluid.buf2list I see that that column is mostly zeroes with an occasional other value in there.

Is there something funky going on further upstream? (fluid.bufstats~ or true-peak)

weefuzzy · May 16, 2022, 9:28am

I suspect that the problem is some pathological case for robust scale, but knowing that it can be narrowed down to just the peak is helpful.

weefuzzy · May 16, 2022, 9:39am

It’s quite likely to be a column of mostly zeros with the occasional spike.

weefuzzy · May 16, 2022, 10:58am

Ok, it’s a scaler bug when the range of a column is zero. In the case of robust scale it can be hard to spot up front because the range in question will be the difference between the selected quantiles (so a col could have valid min / max but mostly zeros and end up like this).

Fix in process.

Meanwhile, question: do you find, empirically, that there’s much value in using both average loudness and peak together? They will be very strongly correlated (as will most of the derived stats), so it seems like you’d end up producing quite a lot of redundant data.

tremblap · May 16, 2022, 12:13pm

Fix done, this data behaves now. PR on the way.

rodrigo.constanzo · May 16, 2022, 1:28pm

Woohoo, a bug catch!

Most definitely not. I was just playing with settings and for the sake of simplicity it’s easier to have the fluid.bufstats~ object be the same across the board. If/when I can just @select loudness I’ll likely never see a true-peak again in my life!

Also in testing today, fluid.dataset~ won’t take a name that isn’t a symbol in the first place. It reports:
fluid.dataset~: Shared object given no name – won't be shared!

Which is odd as it’s possible for it to have no name now anyways.

But since the name has to be a symbol, perhaps the fluid.datasetprocess~ objects can do a type check on expected dataset name inputs and if it sees a float (which can only come from the output of fittransform $1-ing fluid.pca~, it can ignore the float part of the output and interpret fittransform u235352353 0.9352533 as being a <command> <buffer> <unrelated info which I can ignore>.

weefuzzy · May 16, 2022, 1:42pm

Is it odd though? Neither dict nor buffer~ will take a float as a name either…