FluidSpectralShape = spectral moments?

pasquetje · April 9, 2019, 5:59pm

Hello !

FluidSpectralShape makes me remember of what is used in OpenCV. Can I say that FluidSpectralShape analyses the first 7 spectral moments or are there differences ?

tremblap · April 9, 2019, 7:05pm

The helpfile should tell you

1 2 3 4 are the 4 first spectral moments, then 5-6-7 are 3 other shape descriptors. All of them are computed in lin freq space and lin amp space, and some are then converted to dB at the output.

Here is the documentation as it stands. Feel free to comment bits you do not get or that is not clear.

This class implements seven of the most popular spectral shape descriptors, computed on a linear scale for both amplitude and frequency. It is part of the Fluid Decomposition Toolkit of the FluCoMa project.1

The descriptors are:
• the four first statistical moments (Moment (mathematics) - Wikipedia ), more commonly known as:
• the spectral centroid (1) in spectral bins as units. This is the point that splits the spectrum in 2 halves of equal energy. It is the weighted average of the magnitude spectrum.
• the spectral spread (2) in spectral bins. This is the standard deviation of the spectrum envelop, or the average of the distance to the centroid.
• the normalised skewness (3) as ratio. This indicates how tilted is the spectral curve in relation to the middle of the spectral frame, i.e. half of the Nyquist frequency. If it is below the bin representing that frequency, i.e. the central bin of the magnitude spectrum, it is positive.
• the normalised kurtosis (4) as ratio. This indicates how focused is the spectral curve. If it is peaky, it is high.
• the rolloff (5) in bin number. This indicates the bin under which 95% of the energy is included.
• the flatness (6) in dB. This is the ratio of geometric mean of the magnitude, over the arithmetic mean of the magnitudes. It yields a very approximate measure on how noisy a signal is.
• the crest (7) in dB. This is the ratio of the loudest magnitude over the RMS of the whole frame. A high number is an indication of a loud peak poking out from the overal spectral curve.

The drawings in Peeters 2003 (http://recherche.ircam.fr/anasyn/peeters/ARTICLES/Peeters_2003_cuidadoaudiofeatures.pdf) are useful, as are the commented examples below. For the mathematically-inclined reader, the tutorials and code offered here (https://www.audiocontentanalysis.org/) are interesting to further the understanding.

The process will return the seven values as a list, which will be repeated if no change happens within the algorythm, i.e. when the hopSize is larger than the server signal vector size.