Trying to have consistent results with fluid.bufpitch~


In order to test pitch analysis (fluid.bufpitch~) I made a very simple sound bank of 128 samples matching the 128 MIDI notes played with a pitched percussive sound in Chromaphone. The 128 samples are copied into a buffer and the original boundaries used to separate segments (each lasts 250ms).

Global issue: I can’t get fluid.bufpitch~ to sort the sounds correctly according to their pitch. The progression among pitches can sometimes be globally correct but the analysis goes backwards quite often. This happens more or less with any window size up to 8192 samples and any analysis algorithm.

If I display the features using fluid.waveform~, I do also see that the raw pitches seem to be the same for several consecutive segments which have different pitches actually.

I’ve tried to filter the pitch values with a confidence threshold using fluid.bufthresh~ but it doesn’t help. A high threshold often even makes some fragments not to be associated with any pitch at all.

Is there some issue with my patcher or is pitch analysis still such a difficult task in 2023?

Patcher and soundbank are attached below.

Pitch analysis (7.9 MB)

Although it’s a bit old, there’s some interesting ideas/theory crafting in this thread here:

1 Like

the example is still in the distribution under another name: analysing-pitch.maxpat for max.

I’ll check the sounds later, there might be something in them though…

1 Like

I’ve examined that patcher and integrated weighting and stripping in my own one but maybe I’ve got it wrong. Also, with my segments, applying a threshold of confidence might result in some of them not having any pitch data kept at all, which is a bit of an issue.

In your patch the buffer that you’re plotting on top of the waveform – pitch – appears to be the result of a single analysis, not the overall pitches for everything. So, whilst it’s clearly struggling to sort by the pitches we hear, that plot isn’t helping to reason about why (globally).

is pitch analysis still such a difficult task in 2023?

It is! I mean, the two useful algorithms in fluid.pitch are pretty old – yinfft is from 2006, and hps is from 1969 – but I’m not aware of anything that’s ‘solved’ the problems of pitch detection meanwhile. It’s hard because the ways that people resolve pitch isn’t fully understood, but is pretty complex: people appear to be able to draw on different forms of ‘evidence’ in auditory perception, and our attention and cultured history of listening also play a role. There’s a good discussion of the physiology / psychology in Rick Lyon’s Human and Machine Hearing, but I don’t have a reference for the social-cultural aspects off the top of my head.

Meanwhile, most pitch tracking algorithms approach the problem in terms of trying to estimate periodicity and (sometimes) trying to combine that with some harmonic analysis of the spectrum. The latter of these is quite error prone in a (discrete) FFT setting (sometimes constant Q transforms work better). Meanwhile, reliable periodicity analysis on real signals turns out to be really hard because most things are only quasi periodic, and things like autocorrelation measures are very easily confounded by harmonics etc.

The sounds in your corpus are both very short and quite inharmonic (especially in the lower registers), neither of which is going to make things easier.

Using @unit 1 to output MIDI note numbers, and taking the median over slices, here’s the overall trajectory with

  1. YinFFT
  2. HPS

Both have similar pathologies. They struggle with the very lowest notes (probably a function of the duration: lower frequencies take longer to resolve (and need a nice big window). Then there’s the same sudden shift about a third of the way up (which is probably an octave shift?). HPS then falls apart for a while, but does ok (or at least, is consistent) at the top, whereas YinFFT falls apart for the highest values.

What I’d suggest to try and work out some settings that work across the range is to audition with the real-time fluid.pitch and the sounds looping (if they can). It might just be that 11025 samples is insufficient (given that each sound also has an onset transient). It might also be that to deal with the extreme ends of the range, different settings are called for.

1 Like

I understand the pitch plotting issue. I also understand the relevance of displaying the mean statistical value but if I got it right, to have a global view of the pitch analysis it means I should adapt the number of analysis data to the segment’s duration and then concatenate all these data segment after segment before displaying the whole batch. I got an idea how to do that but maybe there’s some specific object/abstraction which would help to deal with with this the easy way?

About the pitch tracking itself: one might imagine tweaking some settings to adapt the patcher to this pre-segmented, somehow homogenous and relatively small sound corpus, but then, how should one proceed with an unsegmented soundfile with unknown features or a big sound corpus containing all sorts of sounds? :thinking:

This is one of the central questions of the project. In general there isn’t a way to point a computer at an unknown, varied corpus and get useful information without deploying at least some prior beliefs about what you might encounter. For instance, there may be sounds that are unpitched, or sounds that are pitched but with very low fundamentals that would imply analysis settings which would be impractical to use on a whole corpus. We don’t believe that there are likely to be singular solutions to these sorts of issue, depending as they would both on the material at hand and what one wants to achieve. On the other hand, we do hope that people repeatedly grappling with them in their projects, and taking the opportunity to discuss things here as they go, will build up a store of community knowledge and perhaps even more general flavours of solution.

In this particular case, I suspect that it’s the brevity of the chunks that is causing problems. 250ms isn’t a lot to go on. It might be interesting to experiment with some dirty data ‘augmentation’ for this particular case: give bufpitch more to go on by looping the sound with fluid.bufcompose before analysis. Yes, it will slow things down, but it may well give better results. If one were to try that, I"d suggest trimming the transient off before hand as well.


Looping? Interesting. I’m gonna try.

Could oversampling help too?

I suspect it might actually hinder. It won’t add any information that wasn’t there before, and will have the side-effect of worsening the resolution in the bass (for the same analysis settings).

However, bearing mind how poor YinFFT was handling your samples in the very top register, it might be worth turning the @maxfreq up for those and seeing if it helps…

Would it be too much work and simply possible to train some neural network using FluCoMa objects to improve sorting according to pitch?

It would be a lot of experimentation, and similarly no guarantee of getting something to generalise. But it would be interesting to try.

One approach might be to try and train a classifier to determine if sound A is ‘less than’ or ‘greater than’ sound B. The tricky thing will be finding compact enough features that can work robustly. In the samples you’ve got here, I quickly tried using the spectral centroid to see if that gave a sensible ordering and it still doesn’t work on those particular sounds (because, I guess, of the inharmonicity).

edit: I should add that this approach still leaves the problem of implementing a sorting algorithm that’s using the neural network as a predicate, which probably isn’t the most fun kind of thing to do in Max :grimacing: