Hi everyone,
I am currently working on a project where I analyze a corpus of environmental recordings using various FluCoMa audio descriptors. Since I am combining different features (MFCCs, Spectral Descriptors, and Loudness), my initial data features have completely different scales.
My goal is to use Principal Component Analysis (PCA) to reduce the dimensionality of this dataset, and then use a KDTree (fluid.kdtree~) to perform a K-Nearest Neighbor query. I want to input a new audio file (a recording of a musical performance) and find the environmental soundscape from my corpus that is closest to it.
While studying the FluCoMa help files and examples, I noticed that the pipeline often feeds the dataset directly into fluid.pca~ and applies fluid.normalize~ only after the PCA object.
As far as I understand the theory behind PCA, the algorithm is highly sensitive to the scale of the input data. If features arenât normalized beforehand variables with larger numerical ranges (like Spectral Centroid in Hz) will dominate the variance calculation over smaller ones (like Loudness or linear amplitude).
Given my specific pipeline, I have a couple of questions:
-
Why do the examples usually perform normalization only after the PCA? Is there an internal rescaling happening inside
fluid.pca~? -
In my case, since I am mixing descriptors with drastically different units and scales, would it be a better practice to perform a double normalization? Specifically: standardizing/normalizing the dataset before PCA (to give equal weight to all features) and then normalizing the coordinates after PCA (to scale them nicely for the KDTree query)?
I would love to hear your insights on the best practice for this specific workflow.
Thanks in advance for your help!