Intelligent Feature Selection (with SVM or PCA)

That seems a lot more sensible and what I would expect, so something’s gone funny in the code/math somewhere.

Does the squaring/weighting produce this for you?

0.547614 0.138316 0.081888 0.062887 0.031371 0.022768 0.02044 0.01675 0.013668 0.011574 0.009137 0.008754 0.007181 0.006379 0.005358 0.004786 0.004126 0.003053 0.002383 0.00116 0.000405

I don’t see how multiplying those numbers by the matrix (unless the matrix itself heavily skews another way) that I wouldn’t end up with what I did in the end for mine (a sequence of items in the order that they are listed in the values field.

Yeah, curious how you get on with that. With my approach, I don’t so much have the luxury of time as my real-world waiting period is 256 samples (so even a single derivative is kind of pushing it there…) and the longer (predicted) window is 4410 samples, so again, not a lot of meat on those bones. I guess I can further segment the 4410, but I’m not entirely sure what I’ll find there (in terms of analyzing the sounds coming from my real-time input).

The reason I’m more interested in morphologies/gestures is that if they are separable from (absolute) durations, I could potentially query a much longer sample for a similar overall morphology regardless of it being 500ms or 5s long.

If that’s the case I’ve got oodles of “labelled” files. Now how much that corresponds to sounding results is yet to be determined (e.g. hitting a fork vs hitting a spoon).

I was poking at this a couple years ago in this thread, about how to optimize some of these algorithms/descriptors/stats/searches if you have known inputs. This is somewhat where I’m with the approach(es) from this patch to try to get the most differentiation for a given set of inputs, but I’m sure there’s way more than that that can be done. I just wouldn’t even know where to begin.

Sure, we can document it, if it looks like digging around in these dictionaries is going to be a popular pastime. I guess the class designer felt like the names were expressive already (although they perhaps make more sense if you know that it’s SVD under the hood) :smiley:

2 Likes

Meanwhile, I went down a little rabbit hole last night reading around feature selection, and unsupervised selection in particular, with an eye on what PCA-based things were in the literature and what of the available techniques might be possible with the toolkit facilities as they stand. I’m trying some stuff out, will report in a separate topic later.

Some potted findings, that I may well follow up on one day:

  1. Irrespective of whatever technique, some visual inspection of PCA results can be useful to gauge how structured input data is. Just plotting values (a ‘scree plot’) and making sure that the characteristic ‘elbow’ is there will confirm that (at least) the data aren’t totally unstructured (within PCA’s operating assumptions).
    If the data have been standardised pre-PCA, you can also recover the pairwise correlations between input features – useful, on the basis that you generally don’t want to have lots of strongly correlated features downstream as it adds more computation for no gain. Doing this involves some linear algebra fiddling, but it’s not too bad (I’ll try and make a shareable example one day).

  2. Feature selection for supervised models has been researched much more than for unsupervised. Perhaps not surprising, given that the supervised case is conceptually simpler (you know what you’re aiming for). In the unsupervised case, the challenge is similar to that faced by dimension reduction algorithms, in balancing the global and local structures of the data.

  3. There seem to have been a number of stabs at using PCA for unsupervised feature selection, although the literature is generally quite sniffy about them. One of these [1] seems to work by doing repeated fittings against a sub-selection of features, and comparing this to a fitting against all features. Another [2], uses K-Means on the PCA bases as a way to try and rank the input features. I’m trying this out in Python, using some stackexchange code, to see what happens.

  4. Recent schemes get more and more fancy. There’s interesting looking stuff using modified autoencoder setup, but it made my eyes water [3]. More manageable looking is a scheme called Laplacian Score [4], which makes a nearest neighbours graph and uses this as a basis to figure out a ranking (which feels slightly similar to what UMAP does in its early stages, although I don’t know the details of that very well). That could be done with a KD-Tree and some patience, so I may well have a bash at that as well.

[1] Krzanowski, W. J. (1987). Selection of Variables to Preserve Multivariate Data Structure, Using Principal Components. Applied Statistics, 36(1), 22. https://doi.org/10.2307/2347842
[2] Sean, I. C. Q. T. X., & Huang, Z. T. S. (n.d.). Feature selection using principal feature analysis.
[3] He, X., Cai, D., & Niyogi, P. (2005). Laplacian score for feature selection. Advances in Neural Information Processing Systems, 18, 507–514.
[4] Wu, X., & Cheng, Q. (2020). Fractal Autoencoders for Feature Selection. ArXiv:2010.09430 [Cs]. http://arxiv.org/abs/2010.09430

3 Likes

Ooh thank you for all of this! Looking forward to making my eyes water as well!

The weights I get from your dataset are:
List[ 0.15497311073088, 0.096434062273653, 0.08044687827162, 0.070550946340388, 0.068899032671959, 0.066968458030149, 0.05724051045273, 0.050665581515714, 0.046898660129103, 0.045572671957601, 0.042492027343501, 0.03681728561751, 0.0362125150149, 0.032381655597829, 0.027450174548984, 0.023427297350118, 0.019409834491853, 0.018006072415304, 0.012436556155345, 0.0092469875671974, 0.0034696815236589 ]

Ok, it turns out I wasn’t normalizing the dataset before the PCA-ing, so in doing that I get the same weights as you now (well, diff float resolution):

0.154973 0.096434 0.080447 0.070551 0.068899 0.066968 0.057241 0.050666 0.046899 0.045573 0.042492 0.036817 0.036213 0.032382 0.02745 0.023427 0.01941 0.018006 0.012437 0.009247 0.00347

I still get shitty results after that step though.

If i multiply those weights by each element of the transposed (rotated) bases I now get this:

1, 0.085675 0.008393 0.001326 0.008187 0.007971 0.008628 0.013086 0.004254 0.002104 0.000748 0.004894 0.00036 0.001792 0.000635 0.000365 0.003239 0.002782 0.000216 0.008884 0.000597 0.000361;
2, 0.007919 0.007471 0.006706 0.040497 0.016871 0.009139 0.006336 0.007237 0.006255 0.017774 0.015751 0.007725 0.003578 0.010975 0.000198 0.003128 0.000263 0.003946 0.000041 0.000033 0.00014;
3, 0.052043 0.001921 0.000773 0.021265 0.008598 0.002079 0.005835 0.000392 0.006376 0.001308 0.004743 0.008552 0.00722 0.002093 0.001648 0.015659 0.004895 0.00475 0.002496 0.001147 0.000019;
4, 0.031974 0.022968 0.004003 0.001664 0.01861 0.004415 0.029216 0.013642 0.012323 0.001535 0.001403 0.018025 0.00695 0.001735 0.003461 0.00596 0.001939 0.000042 0.00262 0.000382 0.000004;
5, 0.045223 0.002707 0.007318 0.026383 0.005528 0.016972 0.019099 0.004125 0.007298 0.000378 0.017384 0.01283 0.002749 0.006888 0.001546 0.003838 0.002201 0.007323 0.00027 0.000619 0.000009;
6, 0.002331 0.038066 0.001567 0.001518 0.025555 0.016394 0.006657 0.015385 0.005177 0.023422 0.008098 0.00586 0.000801 0.007785 0.002735 0.005511 0.001809 0.004711 0.000332 0.000092 0.00007;
7, 0.003937 0.011633 0.019493 0.017428 0.001973 0.00893 0.017583 0.006289 0.002516 0.00732 0.008437 0.006867 0.016423 0.001173 0.010752 0.000511 0.003452 0.007802 0.002297 0.001273 0.000077;
8, 0.00387 0.045505 0.011949 0.013447 0.016632 0.007788 0.001185 0.002647 0.003663 0.005656 0.015184 0.009292 0.009916 0.015145 0.004242 0.003061 0.004089 0.002635 0.000627 0.001464 0.00017;
9, 0.014788 0.035159 0.034758 0.001285 0.0118 0.001598 0.004427 0.015564 0.018955 0.003427 0.004699 0.013617 0.00332 0.002438 0.002516 0.001481 0.004133 0.004965 0.001804 0.00204 0.000249;
10, 0.008042 0.020211 0.038046 0.005854 0.00503 0.00662 0.000684 0.011239 0.002538 0.022968 0.005309 0.00633 0.00962 0.007491 0.005555 0.005341 0.00237 0.002592 0.000283 0.002759 0.000412;
11, 0.002205 0.038956 0.016021 0.00144 0.025907 0.014119 0.004218 0.00548 0.002116 0.004642 0.001317 0.009105 0.014186 0.000763 0.009008 0.001934 0.006157 0.002039 0.000826 0.003047 0.000539;
12, 0.006574 0.008713 0.038988 0.010971 0.008535 0.013974 0.009623 0.017562 0.001279 0.007431 0.013808 0.004971 0.008979 0.005301 0.000321 0.005214 0.002369 0.003255 0.000071 0.00364 0.000621;
13, 0.006309 0.001324 0.011464 0.003034 0.018929 0.033917 0.004756 0.016611 0.012228 0.011017 0.007867 0.000306 0.000959 0.000293 0.011525 0.003128 0.001633 0.000002 0.000483 0.002769 0.000954;
14, 0.00479 0.01835 0.016558 0.000861 0.014925 0.023515 0.018183 0.001237 0.009726 0.014072 0.00404 0.004132 0.00028 0.012425 0.005908 0.000857 0.003898 0.003538 0.00091 0.002799 0.001191;
15, 0.010411 0.000802 0.016985 0.009138 0.006162 0.019901 0.004118 0.003413 0.01212 0.004181 0.002088 0.000008 0.004302 0.013141 0.003365 0.005 0.009029 0.003545 0.000867 0.000049 0.001738;
16, 0.003376 0.029955 0.005552 0.001059 0.007522 0.025932 0.017323 0.000428 0.013371 0.002039 0.000239 0.000393 0.012889 0.006674 0.012893 0.000401 0.001335 0.000208 0.000703 0.001166 0.001331;
17, 0.005899 0.010194 0.005976 0.018071 0.000017 0.00917 0.017292 0.013394 0.019993 0.009638 0.002888 0.006819 0.000199 0.003858 0.000498 0.001016 0.004162 0.001705 0.000375 0.003888 0.001634;
18, 0.002303 0.016692 0.017512 0.004885 0.02297 0.015379 0.002892 0.026624 0.014126 0.003872 0.002361 0.002061 0.000263 0.005358 0.001792 0.003526 0.007392 0.002612 0.000589 0.002668 0.000884;
19, 0.012102 0.010307 0.016405 0.025596 0.025135 0.001103 0.014636 0.00953 0.002035 0.007411 0.002585 0.011326 0.014214 0.002708 0.008104 0.000872 0.006245 0.000522 0.000019 0.002345 0.000603;
20, 0.080493 0.001526 0.001882 0.004281 0.004025 0.00079 0.002943 0.002856 0.007404 0.00084 0.01268 0.000911 0.003213 0.00646 0.005477 0.008516 0.002729 0.004724 0.006727 0.000245 0.000003;
21, 0.060185 0.00374 0.007563 0.018859 0.015962 0.008397 0.011263 0.005141 0.016239 0.001194 0.018245 0.005911 0.00593 0.005143 0.005687 0.003008 0.00432 0.006233 0.002177 0.000752 0.000049;
22, 0.085675 0.008393 0.001326 0.008187 0.007971 0.008628 0.013086 0.004254 0.002104 0.000748 0.004894 0.00036 0.001792 0.000635 0.000365 0.003239 0.002782 0.000216 0.008884 0.000597 0.000361;
23, 0.007919 0.007471 0.006706 0.040497 0.016871 0.009139 0.006336 0.007237 0.006255 0.017774 0.015751 0.007725 0.003578 0.010975 0.000198 0.003128 0.000263 0.003946 0.000041 0.000033 0.00014;
24, 0.052043 0.001921 0.000773 0.021265 0.008598 0.002079 0.005835 0.000392 0.006376 0.001308 0.004743 0.008552 0.00722 0.002093 0.001648 0.015659 0.004895 0.00475 0.002496 0.001147 0.000019;
25, 0.031974 0.022968 0.004003 0.001664 0.01861 0.004415 0.029216 0.013642 0.012323 0.001535 0.001403 0.018025 0.00695 0.001735 0.003461 0.00596 0.001939 0.000042 0.00262 0.000382 0.000004;
26, 0.045223 0.002707 0.007318 0.026383 0.005528 0.016972 0.019099 0.004125 0.007298 0.000378 0.017384 0.01283 0.002749 0.006888 0.001546 0.003838 0.002201 0.007323 0.00027 0.000619 0.000009;
27, 0.002331 0.038066 0.001567 0.001518 0.025555 0.016394 0.006657 0.015385 0.005177 0.023422 0.008098 0.00586 0.000801 0.007785 0.002735 0.005511 0.001809 0.004711 0.000332 0.000092 0.00007;
28, 0.003937 0.011633 0.019493 0.017428 0.001973 0.00893 0.017583 0.006289 0.002516 0.00732 0.008437 0.006867 0.016423 0.001173 0.010752 0.000511 0.003452 0.007802 0.002297 0.001273 0.000077;
29, 0.00387 0.045505 0.011949 0.013447 0.016632 0.007788 0.001185 0.002647 0.003663 0.005656 0.015184 0.009292 0.009916 0.015145 0.004242 0.003061 0.004089 0.002635 0.000627 0.001464 0.00017;
30, 0.014788 0.035159 0.034758 0.001285 0.0118 0.001598 0.004427 0.015564 0.018955 0.003427 0.004699 0.013617 0.00332 0.002438 0.002516 0.001481 0.004133 0.004965 0.001804 0.00204 0.000249;
31, 0.008042 0.020211 0.038046 0.005854 0.00503 0.00662 0.000684 0.011239 0.002538 0.022968 0.005309 0.00633 0.00962 0.007491 0.005555 0.005341 0.00237 0.002592 0.000283 0.002759 0.000412;
32, 0.002205 0.038956 0.016021 0.00144 0.025907 0.014119 0.004218 0.00548 0.002116 0.004642 0.001317 0.009105 0.014186 0.000763 0.009008 0.001934 0.006157 0.002039 0.000826 0.003047 0.000539;
33, 0.006574 0.008713 0.038988 0.010971 0.008535 0.013974 0.009623 0.017562 0.001279 0.007431 0.013808 0.004971 0.008979 0.005301 0.000321 0.005214 0.002369 0.003255 0.000071 0.00364 0.000621;
34, 0.006309 0.001324 0.011464 0.003034 0.018929 0.033917 0.004756 0.016611 0.012228 0.011017 0.007867 0.000306 0.000959 0.000293 0.011525 0.003128 0.001633 0.000002 0.000483 0.002769 0.000954;
35, 0.00479 0.01835 0.016558 0.000861 0.014925 0.023515 0.018183 0.001237 0.009726 0.014072 0.00404 0.004132 0.00028 0.012425 0.005908 0.000857 0.003898 0.003538 0.00091 0.002799 0.001191;
36, 0.010411 0.000802 0.016985 0.009138 0.006162 0.019901 0.004118 0.003413 0.01212 0.004181 0.002088 0.000008 0.004302 0.013141 0.003365 0.005 0.009029 0.003545 0.000867 0.000049 0.001738;
37, 0.003376 0.029955 0.005552 0.001059 0.007522 0.025932 0.017323 0.000428 0.013371 0.002039 0.000239 0.000393 0.012889 0.006674 0.012893 0.000401 0.001335 0.000208 0.000703 0.001166 0.001331;
38, 0.005899 0.010194 0.005976 0.018071 0.000017 0.00917 0.017292 0.013394 0.019993 0.009638 0.002888 0.006819 0.000199 0.003858 0.000498 0.001016 0.004162 0.001705 0.000375 0.003888 0.001634;
39, 0.002303 0.016692 0.017512 0.004885 0.02297 0.015379 0.002892 0.026624 0.014126 0.003872 0.002361 0.002061 0.000263 0.005358 0.001792 0.003526 0.007392 0.002612 0.000589 0.002668 0.000884;
40, 0.012102 0.010307 0.016405 0.025596 0.025135 0.001103 0.014636 0.00953 0.002035 0.007411 0.002585 0.011326 0.014214 0.002708 0.008104 0.000872 0.006245 0.000522 0.000019 0.002345 0.000603;
41, 0.080493 0.001526 0.001882 0.004281 0.004025 0.00079 0.002943 0.002856 0.007404 0.00084 0.01268 0.000911 0.003213 0.00646 0.005477 0.008516 0.002729 0.004724 0.006727 0.000245 0.000003;
42, 0.060185 0.00374 0.007563 0.018859 0.015962 0.008397 0.011263 0.005141 0.016239 0.001194 0.018245 0.005911 0.00593 0.005143 0.005687 0.003008 0.00432 0.006233 0.002177 0.000752 0.000049;

Which now gives me the same same, but different, but still same final results of:

ImportantFeatures: MFCC1
ImportantFeatures: MFCC2
ImportantFeatures: MFCC3
ImportantFeatures: MFCC5
ImportantFeatures: MFCC6
ImportantFeatures: MFCC4
ImportantFeatures: MFCC7
ImportantFeatures: MFCC8
ImportantFeatures: MFCC9
ImportantFeatures: MFCC11

It kind of feels like theres some other rotation I’m missing later in the process.

If I take the bases and sum them together willy nilly I get this:
Screenshot 2021-03-04 at 5.51.17 pm

Kind of flat-ish.

And the results of my (and your) weights returns this:
Screenshot 2021-03-04 at 5.52.05 pm

So the aggregate of multiplying those together (or rather, multiplying each base by the weighting) can only really return me this:
Screenshot 2021-03-04 at 5.55.23 pm

On a hunch I’ve tried this on a couple other datasets and I’m getting results that look like bugs. Will make a new thread about it.

My hunch is that no matter what kind of dataset I feed fluid.pca~, the values field always looks like the ones I’ve posted above, almost as if it is sorted.

The content of the values list should be sorted in descending order. The weighting process should make no difference to the shape. Looking at your middle picture, my interpretation is that the first PC is swamping the others, which means that your features are probably heavily correlated.

Is this from the dataset you attached above? I’ll take a look at some point. Meanwhile, I can assure you that PCA is not giving the same results irrespective of input, on my machine at least.

2 Likes

Ooooooh.

Ok, so if the values are sorted, along with presumably the PCs, what maintains a reference to what the original columns were? As from what you’re describing, the weights/maths will always look like what I posted, with the massive difference being that the columns of original data that those point to are not the same order in which they are presented.

What I’m doing at the moment is matching that against my original indices, which are in the order that they were put in the original dataset.

Yeah, the one from above, 21pl (21d total, with pitch/loudness).

I think your bases matrix isn’t getting transposed properly. I get this (which is the transposition of what you have…)

[ 0.085674709490373, 0.0079191852023138, 0.05204293804988, 0.03197420657061, 0.045223146375938, 0.0023314794589638, 0.0039372002040182, 0.0038699399501186, 0.01478845258603, 0.0080417761855798, 0.0022050250123904, 0.0065740141262804, 0.0063085450087277, 0.0047901609930548, 0.010411467483957, 0.0033764051888803, 0.0058990764417425, 0.0023025896685506, 0.012101805358861, 0.080493093461577, 0.060185117980584 ]
[ 0.0083929806658832, 0.0074708199357045, 0.0019209180915349, 0.022967991470978, 0.0027073470102263, 0.038065870652227, 0.011633050231146, 0.045505435276381, 0.035159049747797, 0.020211121723468, 0.038955816681973, 0.008712647950673, 0.0013243673049267, 0.018350288245232, 0.00080243001903751, 0.02995517014857, 0.010193799514974, 0.016692135912115, 0.010306635553661, 0.0015255021385617, 0.0037402818271701 ]
[ 0.0013259181890063, 0.0067061531443768, 0.00077327329365581, 0.0040031781816148, 0.0073180366327661, 0.0015669898440095, 0.019493443795557, 0.011948641810645, 0.034758050974006, 0.038045925735573, 0.016021448524029, 0.038988309109179, 0.011464492399067, 0.016557545347153, 0.016985487799292, 0.0055518478885504, 0.0059763837818836, 0.017512417994185, 0.016405246227172, 0.0018815777467274, 0.0075625428969702 ]
[ 0.0081873810743327, 0.040497147459184, 0.02126545420998, 0.0016642543209231, 0.026382973654522, 0.001518080838795, 0.017428106936046, 0.013446634410339, 0.0012845649821746, 0.0058535227702705, 0.0014400492658014, 0.010970707376913, 0.0030337868457204, 0.00086099918877666, 0.0091376657472844, 0.0010590883053682, 0.018070734801476, 0.0048846771418888, 0.025596328061799, 0.0042812242211347, 0.018858887150459 ]
[ 0.0079707451536959, 0.016870692464001, 0.0085980386323731, 0.01861018087591, 0.0055277453900817, 0.025555044739261, 0.0019728888275963, 0.016632003979259, 0.011800297457422, 0.0050303428954418, 0.025907054740648, 0.008534634829314, 0.018929387319417, 0.01492450881549, 0.0061620632810704, 0.0075217362518437, 1.6637379916186e-05, 0.022969752537689, 0.025134804878052, 0.0040245133606798, 0.015961968726224 ]
[ 0.0086283320876439, 0.009138615379195, 0.0020787787022085, 0.0044148174067338, 0.016972230928363, 0.016394041393093, 0.008930154150839, 0.0077876503369629, 0.0015977628028999, 0.006619856088527, 0.014118608851225, 0.013974084607581, 0.033917438058206, 0.023514721079221, 0.01990058998866, 0.025931885879047, 0.0091700882176388, 0.015379282141517, 0.0011029652547944, 0.00078994419776027, 0.0083972547542706 ]
[ 0.013085592424813, 0.0063355639991285, 0.0058346789807448, 0.029215884704483, 0.019098999656677, 0.0066567974618289, 0.017583188074248, 0.0011848011683011, 0.0044270303215932, 0.00068432311180218, 0.0042184860527867, 0.0096234997928294, 0.0047563292688204, 0.018182901958542, 0.0041179009643925, 0.017322727072062, 0.017291735899902, 0.0028919137597426, 0.014635578143606, 0.0029434597637052, 0.011263005876615 ]
[ 0.0042535343458144, 0.0072372520364664, 0.00039164665544676, 0.013642486055104, 0.0041254428390509, 0.015384842831492, 0.0062893233950345, 0.0026467093890891, 0.015564495882282, 0.011239404633756, 0.0054804104158681, 0.017561656703572, 0.016611366419268, 0.0012369170976294, 0.003412568188145, 0.00042788663179312, 0.013394070856364, 0.026624383302695, 0.0095299495796588, 0.0028561992492781, 0.0051409440422136 ]
[ 0.0021044670026056, 0.006255476236883, 0.00637558497291, 0.012323339462485, 0.0072983199952654, 0.0051769437127049, 0.0025162471424515, 0.0036627295005421, 0.018955340197964, 0.0025377531013111, 0.0021156550536501, 0.0012788637521185, 0.012228320669971, 0.0097257392350349, 0.012119579660571, 0.013371037061235, 0.019993468089938, 0.014125802085269, 0.0020352804473049, 0.0074041742330056, 0.016239149668947 ]
[ 0.00074818141104052, 0.017773935145402, 0.00130787663503, 0.001534900488151, 0.00037757595328643, 0.023421631952837, 0.0073204467152689, 0.0056564505684015, 0.0034269890851497, 0.022968179346448, 0.0046424680637725, 0.0074313854357354, 0.011017288098414, 0.014071536338836, 0.0041811675650504, 0.0020388421261906, 0.0096380867586319, 0.0038724805158962, 0.0074110009535502, 0.00084002778905312, 0.0011936966393881 ]
[ 0.004894445578257, 0.01575126255459, 0.0047431122935968, 0.0014026277184994, 0.017383540737843, 0.0080976580766987, 0.0084367439532285, 0.015184219282728, 0.0046992969775539, 0.0053086668330665, 0.0013173870891018, 0.013808140413974, 0.0078669175758944, 0.0040401403131575, 0.0020880787421356, 0.00023939976594711, 0.0028877453580486, 0.0023606185594488, 0.0025845897418905, 0.012680365320616, 0.018244831095103 ]
[ 0.00035981599887115, 0.0077252051087999, 0.008552128487124, 0.018025306293651, 0.01282992105167, 0.0058604289146545, 0.0068665120172219, 0.0092915605129709, 0.01361737632363, 0.0063299185241133, 0.0091048847600762, 0.0049705751400286, 0.00030581164443482, 0.0041319240220118, 7.7501491614522e-06, 0.00039349297613731, 0.006819488899209, 0.0020614784479674, 0.011325672557314, 0.00091123193961607, 0.0059106866063441 ]
[ 0.001792158625188, 0.0035776577180831, 0.0072196935843671, 0.0069498090126289, 0.0027487229956641, 0.00080146443817238, 0.016422676939613, 0.0099163006965667, 0.0033203350246128, 0.0096196235480229, 0.014185594344144, 0.0089786071873625, 0.00095866266326462, 0.00028034860155236, 0.0043021300502587, 0.012888597870793, 0.00019863522004347, 0.00026259446732948, 0.014214451716567, 0.003212536139351, 0.005930109087753 ]
[ 0.00063546264238881, 0.010975098563057, 0.0020928306731365, 0.0017351265694202, 0.0068883102499328, 0.0077848376506567, 0.0011725604767238, 0.015144804013246, 0.0024383654679059, 0.0074914035871718, 0.00076259006857333, 0.0053007412528261, 0.00029325808233155, 0.012424831646323, 0.013140794596996, 0.0066739916667348, 0.0038582285033647, 0.0053577605833942, 0.002707600068163, 0.0064597413208824, 0.0051429354840724 ]
[ 0.00036483447658797, 0.00019843009290776, 0.0016478141104618, 0.0034611838971775, 0.0015455148875124, 0.0027353308733717, 0.010752217084163, 0.0042416633527924, 0.0025158109467384, 0.0055545949073104, 0.0090076013365739, 0.0003207493838618, 0.011524614917701, 0.0059080343974163, 0.00336494261777, 0.012893463133743, 0.00049834822321349, 0.0017917505498284, 0.0081035859795332, 0.0054768320114716, 0.0056869174896271 ]
[ 0.0032390021988444, 0.0031284440710511, 0.01565910883367, 0.0059598953586971, 0.0038377091061078, 0.0055109753907824, 0.00051057752607999, 0.0030607526543794, 0.0014814146215818, 0.0053410359886931, 0.001933751266325, 0.0052137148092426, 0.0031276493056885, 0.00085742306205917, 0.0049996416980219, 0.00040089948321749, 0.001015788135776, 0.0035255782639993, 0.00087237658158395, 0.0085155817193951, 0.0030080709663397 ]
[ 0.0027818564326403, 0.00026251080354188, 0.0048951511733444, 0.0019392339884695, 0.0022014757564584, 0.0018093996468832, 0.0034520823684344, 0.0040887207157837, 0.0041327716873124, 0.0023699646967616, 0.0061565326112835, 0.0023687891819669, 0.0016332229443169, 0.0038984782231403, 0.0090287061229179, 0.0013354006069786, 0.0041622045756598, 0.0073922617185721, 0.0062446162496238, 0.0027291503162106, 0.0043204200136989 ]
[ 0.0002155332869862, 0.0039455993198548, 0.0047497210349999, 4.1748577909669e-05, 0.0073226753480411, 0.004711000680639, 0.0078020174133999, 0.002635216631746, 0.004964714796884, 0.0025923686231001, 0.0020391867416874, 0.0032545966117301, 1.8551000095464e-06, 0.0035384790511058, 0.0035449905139284, 0.00020845567200302, 0.0017050394197346, 0.0026118598737157, 0.00052241194420062, 0.0047238194763051, 0.0062325631549107 ]
[ 0.0088839270752737, 4.0568891407827e-05, 0.0024958736149295, 0.0026195699400858, 0.00026988597726977, 0.000332430767527, 0.0022967043494426, 0.00062674501363831, 0.0018035926539118, 0.00028322394579333, 0.00082553494308043, 7.1404840136407e-05, 0.00048342179455632, 0.00090992019686476, 0.00086701056406186, 0.00070254804690833, 0.00037453317033605, 0.00058885206454985, 1.8686243082883e-05, 0.0067265689858629, 0.0021768738145038 ]
[ 0.00059739790222988, 3.2969012034721e-05, 0.0011466346379692, 0.00038206127138069, 0.00061890339091356, 9.1672912327816e-05, 0.0012727421886293, 0.0014643378832309, 0.0020397368274985, 0.0027587688104809, 0.0030470828798425, 0.0036403986200049, 0.0027693861728218, 0.0027985492945377, 4.8761270448789e-05, 0.0011662903738832, 0.003888149049831, 0.0026681289820816, 0.0023446388390227, 0.00024547268739985, 0.00075214525906851 ]
[ 0.00036147074088183, 0.00014030759569665, 1.9387009023901e-05, 4.4459503052122e-06, 9.3445801047111e-06, 6.9819100689888e-05, 7.7026764772544e-05, 0.00016970274830559, 0.00024860370048162, 0.00041201858119262, 0.00053877102244087, 0.00062065864186089, 0.00095447158443799, 0.0011907234619355, 0.0017384499577933, 0.0013305691440203, 0.0016343095520464, 0.00088441451467647, 0.00060346208432582, 2.6532030450951e-06, 4.9062642529061e-05 ]

@rodrigo.constanzo ust looked at your patch. I think jit.spill might possibly undo the transposition.

1 Like

Hmm, maybe the dump out in Max already transposes it somehow? I’m running it through jit.transpose~ which does appear to be flipping it:

This is straight out of @weefuzzy’s js pca_dump.js with the first one being direct, and the second one after jit.transpose~. This is also pre abs-ing and any other stuff.

Perhaps I’m supposed to transpose again after the weighting step. That would give me the same results as you.

Hmm, if I print the output it looks like it matches the transposition.

Is there a (not brutal) way to transpose a matrix in list-land?

edit:
It looks like the transposition is retained after spilling:

Aha!

It was a post-weighting re-transposition that sorted it. It’s possible the previous transposition was getting fucked along the way, but after some very ugly/hacky patching going between colls, dicts, lists, and jitter matrices I now have this:

ImportantFeatures: Loudness
ImportantFeatures: MFCC5
ImportantFeatures: MFCC9
ImportantFeatures: MFCC4
ImportantFeatures: MFCC8
ImportantFeatures: MFCC6
ImportantFeatures: MFCC19
ImportantFeatures: MFCC12
ImportantFeatures: MFCC2
ImportantFeatures: MFCC10

Screenshot 2021-03-04 at 7.30.30 pm

Which matches this:

So I just need to figure out a streamlined way to compute this stuff that doesn’t involve jumping between every domain of Max. My jitter knowledge sucks so I was only able to do the more simple transposition/abs-ing, but couldn’t figure out the square/weighting and subsequent row summing.

Most likely my js is constructing the matrix transposed in the first place. So you should be able to remove both jit.transposes.

As for jumping between domains, I’d stay in jitter as long as possible, otherwise there’s going to be a lot of uzi-ing

That doesn’t appear to be the case. If I take out the jit.transpose I get the problematic values above.

I was thinking the values need to be transposed as well, but since that’s a single row, it doesn’t make sense.

So it seems like I need to transpose the bases, weigh them against the values, then transpose it back to sum the rows. I guess I can sum the columns instead at the end? I’ll test that now.

Yup, that works. Much easier to do that with zl objects too.

It’s possible I misunderstood this from @tedmoore’s original post, but what appears to be working is now doing [ (a+b+c), (d+e+f), (g+h+i) ].

With that being said, I’ll try and tidy things up and post a sensible version of this code. It still jumps between domains a bit (actually, scratch that, with no need for jit.transpose it can just jit.spill from your js code and zl everything else after that.

1 Like

Ok, here’s the much simplified and tidied code. You feed it a fluid.dataset~ and it does the rest. (in my case I have a coll with the column labels.


----------begin_max5_patcher----------
4350.3oc0ck8iaajl+Y6+JHDlGlcQaM08w7TNlMYWr1yDrwKBFDLnAaopaSa
JRARp11Yv3+1mupJJ15fGkjptcm7PmJ7npuie02U8Qk+4KewraJ+jodVxeN4
WSdwK9mu7EuvcI6EdQ6+8KlsJ8SKxSqcO1rByGKu48ytxeqFymZbWtJ4ilr6
dWS816TtoI2zz740F+rOaVx+n8VqSaV7trh6ttxrnweWLBglitJQhmi26eXW
kvk16PHyQcyPwlUYEvz6HIzCWzuntqhauZ1RG8Az7qvjY1q8ud4Ks+4pupLL
iglyer4WbD425C42dXJoFaIcERMmS4BLRo4XMBqz.WoGmqv8xUni4J04vTqL
00o2YNVKZRWdtHVMwwQT1bshRHDrjQnZMGTqT53LKITUnLhZvklaMU4ke773
WIwwSRhXN2xjbpfJHbkh.5V14na6gcEwje+s7j6pJ2rN4a9sb34qy9MSBk.F
ZFV.b0TBAO.WJjy4BIWQwDMCoPTPmi4hyQoS5QJHhqTndypGAYPe.ALSEKY.
KhxfljkaVsN4ljEk4Cau19LVt+lzh6r+a2COljfP71zj74fcNBSCXADAqERq
MvyYKA8XAAUGSmVvZYRtMqooJsn91xpUmONXK2ef8OoB1Kn0QBFPwQm6G2j+
TbMlysrFFC7FrSfpTRgRSIVzufFItFKioONPcm7pW8JmJOyTzLu.z6o4fofy
yS.Abka8wgEy0XMQJkHtBwv18+Tdjb7w0QVDzg32SVDUAyVrAGvALIVHDLB3
kfABKBSFKAiHlBl0UYEMmYDrJWDrTHbUXS.EwAVlqDDIDBKkDKtkFw8+2luI
a47koMo0llubpp9stEtMuLsYR6DakNZ0bE3hjRITIUnXDv2.VKijyAlH5hmN
AwWFTRjmU2DtjX6tBMDjHFRG.x7Qw3RAhB9InwxKoJhBh2m0L2gLVWVOLd.d
pqWk1Tk8oIEAbE2BF3D7bIWyvBN0tYAisY.dVhfdbZDyjau27o0UI+gawI+m
veImYFC9.EE.yKnXNDlDDb.iJIJfeQwJCILhG0HEuIIOIeRi.Spxo9..YT8b
EgRgbgQfL.bTZuVrhMDiXOmyT.1p6..ftVnA2CDj.bOnYjnlo.HYezP93KB4
yEj4RveHCz3HlFxXBz9JQjxUVqeTX6+jcCex2TuHMOsZU4RSxkIDD.HPHnPx
QPRBHAlKs4JKiU.BpX5AbY1hl4aJVmt3CI2mluwT+muL.PeF7wXbjP.Rcj48
gSMFtYVYQZ0m6L9EXYBrwAsWdRLXO.2Eazop8Y8HA3aI45z6MKuFtFP.WmBQ
6mcCjrWcqbnUP7hYlU2XVtSgEsjdU5JSio5ZSQ5M4lAt4pz0q6ts6tNw9oJ7
WTtZk4gHt6j9o.iBYiTdaxZS5GpSZJSZR+PWvG4YElE1mXWoSewZg8EOFCoa
HURIGggrTEH.9ssjD1RSdgUfkiiYJH7yLYKLgrMaKxgrJQeN.rd1hwXwjUwn
ysVyJ9S.udV4ZMDh9VSZylJScRVQRy6xpSZy65zgzsgJBtL1CRirAOKhDjlR
hIuCl2F8TRjCVBIhynIVdwLDQskBtsrnwFMm8V54nnxlnoYSh0c2troVDS1j
FO1b.GitRjj7+rZcYUSZQyOzhqmM11UuooCKMBUYS9mRizAgIi7gkjA94FKX
uISzuks68DwNu5e0S39RQzqHb8mWcSY9ky3GVQTh07rLVLtHlQ6snLOO41Cf
x8y4WcJp98OJDBRctE3nmf8Dc60sT+0V+IGFd2CNTvaCga6i8qtGr6Qg67Ay
m2MXO3JtX9a4827Ce+2iAlt8ldIduS.d3IfDzDPFdBnAMAzgm.VPS.a3IfGz
DvGdBDAMAhgm.YPSfb3IPEzDnFdBzAMA5Q.RnvPRigECDLNBZDGFbDOBdDGF
fDOBhDGFjDOBlDGFnDOBpDGFrDOBtDGFvDOBxDGFzDOB1DGF3DO.57mx.i4g
YkZ.v4qK2rrvXc27vjzNDlsyK67Nq2qAeLY0YkENaDmed2CG0Sk49K2y+gG4
mfogvAhkmetJxU4MOagIAih.eevg5nPPNLJRr36HGfaMDA+kvy9VT7v71nJL
DdKNV7LJlEb.MGKXLsJAFHwPP41Abfhw1AJhRRrCzHgx+LPJJR2yH.rr6JJr
f6dXJhvrCDZh1Of0dEIA6WBFD6malEvUb2hhnB2RvXRs6BBgpcEjBb6ZhbOB
GnF6KQPbsV1acBnWRuVRnpiyDkybnUl9xKex40zFaxaxfMjKMUABMqARn4y9
JQhmjo6sJCTN0ijESET91LpIgeLjxGxLdvJqZYBX0.IgmXcar.50623qeuMz
lzl6uQqWFXOOZb0V9QtsxYQg2dhZSiH0bFdQPe8dMlQh0YyF8SmK8l5+3e3V
7+wYp1kN6A8dXrJdjNQJLJ9cq3n8r4zsdASMHaKhkSak94caKuU226QwpiUX
ppeu1qVshm9Os1n0qVRRjaQo50Y44IeisarxME207tHAS5ssEvLbj5SINI5l
H7ct8Yy1sGLZespmjEq35iYm5795j0KRu1x1ygw2.6QNyOCo1NSq2SoGSh1m
0A9DOl9ay.DMD7nONtGX0Yidp8Qu7.diPv5+kjuAXxkYqLE1hQTmfhTyQ11D
881DsXYz9FB16LSbSlKupC9v+bzm856KopK2TsXK609czAQq2QSKM0MYEoMs
Uo4W6Rj08P8pMBdkvArR1LN14gJqr4RYEOO9KMytz3tGZU1x0kPRGsxUF2Wv
.DnA26TQregD9iEom6cLiftHFQHBfQTQPYICXcrmgyNOzABLLFgcYpK3y0BI
gJPbIlhA2PVQEcnadonLDKDRWNFoSUt1v.19Nmnkbpx1qmvHHw4DIjuQ+26B
IbpFHIxDzsjLFcSz9OlCDHTUHInAffCgHC01NKA6auhQtGGImywJACxdBgDH
lsXXsNT64dWrhhGhhhEAvrHHHgHF13PnP1fJ7VZtnkRqCXkNjdN.vnP9fD0z
4.hfwUXEEhafZ+JWzb5.2K1F0BhSThGCGCpfVZZDTWJQPpK9X9fZUWBIetfI
4XpzVkVvFJj+qVwF3dWJgSClvGwfpyHRuD9vL0k5BKDc6Vr2ksRhSva4ksRg
XyTFEdhDf6HULVIQH5IoLFqj7D7CzORt08uDhwmqnbglvDbrPy0cH4dt2kR2
grETDEcQPwnGgEhGhRmFCWxbUHI3DC3EOD3kLJ7THFbnwHO.NKHq+wHJMdHn
bVTDejfXpXrgxk4LdJzWLzTLVn37KMTFF8IakBQOEAYmK2qI0R5XsROI18ba
+mxANMJROQ.qDOFlHng3dhFE8zSDxKnpuMZR+bs6.+3bxbIixnHFVJPBrdmj
2O9dWZdtg3piGmLpCNEMTTJczjkfgGskZJKNhXfvvAvS3iC+wWX4z0qu2Tss
m8bqwrUouurpqIaA3Xg++zUA6YUl665wOeKvNKsZw6xZLKr8mtqL1exVPF26
VBYwWrIqMQdWKFNy0PLGTM6tyO3a2rLq7maRa1Te8aLEa7kt29ygU5l7l8EE
2b2sY44KJy8z2tmIw1Z7Oye2q5Z471m01EKDMCiU1VYgBIkRDtQv.6W.Qmlq
8cvaeIFmos8MGLxV3etajhP4T6HzAuF4g0Bg09U.oU1JjZGAWB1Di280RKty
e.ID2AJz0Ykk1unkVwN7p5tmeSS4cUoKcMbwQmqxUsHoJ3tc8gzrVYpqal19
N6h75TFudyhzwU.6IP0VyOZKqormduzMhYqhpqegDcfzzEK.BZu2kR3L2qRY
bPB4FI8WaG4i6CGZK4AX6+1ZSQxOmVTm7ylUY2Tlu7gCvwjaVc3pHUBgmZTH
rR4GAKn6ibemkwdnRtW75rB69CSmdjpIXmdjzADH1enD1WOtOPyU0C2ZQzTP
z3ecLRb.yEBdtSWO.jNPX19fZUGTVY+XXT9QGwUGBpEa0TB6+H8i7yziBltq
czxMtuv48UsDPd5nFr87Fb6FgD1c8Iz7NSeW3Fh2j1zL1FBK771zslmw6Aa2
1cd1lSMZDzO8sMW6dMzIsO0iH5azt3wd1lFlJ+vsoeaUVZ9ro2c45qvs+Yvs
SZt0xs2jBShbvcKZmdv1oE4FvyzCuI1O0ZvxD2yC.ZWR+88dvfesHuI7.DMI
ZH5e4s+vI.kwBJfAb65kLgV2NBomDIyQRgS2fDbJ1OEXBlLtCm+6Ourp7NSw
acXgwb0vDXLxGdflHXN.AWiAcy9KPWXd6Chrgj3LhAzDW38Lh7+tMBP392Xf
sgw380Z2M3dT41.ZdNfwCyI+W8Xm5GhqhFD20JKYKNAXtx9S03U9As19rWhS
mDlyZML12nQv4+novbe5rIPoCZu9BsvGJQelI.fOMw5YiGOTE1F43wiNKT7X
fUcrwpe2llFXoGAwdriVUmbB0NBGax5uXiq34FQ8ioYEeIPpBTjclpOdT7Is
5O59DBGg3NxFBtcq5C+4QglHOqHp+2c+JeBxD8fV15M3f.MvMAGOZBQStFwT
d8ZyxwjU8ICDA3J.2iXMlj8ab+jPtno5Dsh73QP11j8Km.raRyGWBJLlb1es
b419dNbLxYrOIXOsO1tL+qSaBYBCHQkZ1r5lwIl9rhbrSyGAR6mxVzLsr5qN
.9mpLc+hX8LJZiedQ4ZyoXw3r1S8Pwt0T6+7XgEda5MmJHHzs6e8PNuExF5l
ICftmj2Pg5h7QLRn2VdmMejSToPkBlD6cFwnXWNRr8NgjuhQA++uZhCV5IT7
9Ko2ar+P2GwDFN6nB24a59BYtkoUe3UE1+OPzqbUN3Dw8gVg29qJAqyLfr6H
Z5NrFkaZNtPEmX8TufZngmXORzq7PzqeVLOhw26zglJDp8yNKFSEYpynY2yl
3lbCXY1eJheWY4GlpnWObDHGMX3ydXzZehhXIsBaYddCrFH97XYb58qyJ9vo
T70.MIz2wkABCjpyKNUff2a5DyB30FBad0d+cv3eBXENxzX.uywv9GNSB2iS
2cTDJka3z1Sn02mjy6dLNNJaS9fOmrew1ZOesCVJ2FJwzgPeXsQPZVa2Vn0t
eOWcijJc.aA6hXf2Y9bKjId70pzOIrerygxSGHfcArMr0G09IEIaQsm4A0DM
F18MQGb3tLBkSz66GyZON55g89w6oeB6zOVqHDlnpqSKNdzHVqjJaWocUuid
BcoeI8o196NBSK5+15OshtsSO8E8R.3+wJ36x2XdE9DyBhhks81vCFizQ2DT
KAVU9whSlBeZLR5ovu+yomNAhHZtu0KjRJk326PvPPIwm.+wJi4LnPmzqqlZ
XgMZy3SbuArrUzjdxjmhgDBe09DHrv4QQJzXMO9z3+mY4ISextX2vc46fZ8X
Da56uaxyK+3oKBIbe6EIUXkKfSDWy6qi41uep3ZDR8Xv.jeOx.tiu3GKyOCH
RWIj3PjPbemdsqKsnPfq2Tsd7xjdTvoLjOtFBGIzNLLQwEDUfQnI65cSFlCl
Mb11rVOh5VyJyBS18iexc8gY5xT9AhCsaihFGhymhhEVbBR9IAD8lvLCrCpa
8eHa6scXSfNpdRpMEKqet4A4Agb7260a+8hjXjzeVSLAR29YDHwQE6zjt9Dp
R3qSaJSdsMKzYQa8qaL2tIOuYxCX4vddUw4XeYsE1e7y7ifcWicfW1u0Eo6Q
Ecg3aqfPW6nFUl5j4IhRp7wYP6PxThlpUQHU8uhxCS8yquWfOZqlymOAf+eI
qp4yI+W2UNUYwwZ6OL0s14YsG5sv9+HM2uikGze1I7oE7fngJl7DuPVEsK8W
Zq9ErK0d.Smol1+sy4NAOqnpdcqJz8CW2K+Wu7eibe7Jh
-----------end_max5_patcher-----------

edit:
It will crash Max if you open a dataset with 0 range though.

Like actually crash? Did you get a crash report?

I did, but it was jitter complaining, hence me not including it in the other thread. So it could be the js or jit.whatever, but it didn’t crash once I disconnected stuff from that part of the patch.

Just double-checking this as per @weefuzzy in the thread about IQR-ing stuff, PCA wants zero-mean stuff which would be either standardize or robust scaling, whereas my process here (based on @tedmoore’s) uses normalization before fit-ing the PCA, so that would produce 0. to 1. instead.