PESTO - pitch detection algorithm

Much like the Hz thread from a while back I came across this newer algorithm today by Sony called PESTO : Pitch Estimation with Self-supervised Transposition-equivariant Objective.

Basically a neural pitch detection algorithm.

Someone’s made a Max external, which is handy: pesto~.

I had a quick play with it and although at smaller block sizes it’s quite shit, at 1024 (the default it seems), the tracking is really nice. Quite a bit better than Yin.

Obviously there’s so many other things that would be interesting to have (if/when dev time comes up), but just wanted to post this here in case it’s of interest as more cutting edge pitch detection algorithms (whereas I don’t think Hz is so cutting edge, more just use-case-optimized).

3 Likes

That’s cool. Can you give us a bit of quantization on this, maybe a plots and stats comparing the two?

1 Like

Not gone that deep with it, just did some A/B-ing with it in patches and listening.

Don’t really know what kind of material would be suitable for quantifying the difference.

Actually here’s a quick screen capture:

1 Like

nice. thanks for the demo!

1 Like

can you compare with pesto @conf 0.70or fftyin with threshold at 0.98? What I hear is yin spitting pitch with low (0.7 is low) confidence…

it does look fast though

Here’s a follow up video showing some more.

It seems like the confidence in pesto is way stickier/better (even with this funky vocal sample), and also seems to track the microtonal stuff even better (by comparison the yin sounds a bit sharp most of the time).

it is indeed more accurate, sad that the license is not compatible (iirc) we could have, eventually, incorporated it. How’s the cpu usage?

I don’t know this stuff too well, but I guess this means it’s GPL3.

Crazy crazy cheaper. Like, crazy.

yin seems to move between 17% and 45% whereas PESTO stays between 3-4% (for 100 instances of each).

Yin:

Pesto:

1 Like