Harmonic audio matching

Hi there,

I’ve been lurking around on the forum for the past months and you’ve all been a great source of information while getting more comfortable with FluCoMa. Since a few weeks ago I have been thinking about and experimenting with sound into sound concepts. Mainly inspired by @rodrigo.constanzo’s Sound-Into-Sound work , I thought it would be interesting to build a framework which provides some sort of ‘harmonic decoration’ on top of incoming live audio. So roughly taken: (live audio) → (live pitch analysis) → (trigger match in corpus) → (live audio + corpus match) via dry/wet or some cross-synthesis alternatives.

My initial idea is to create a sample folder containing exclusively samples with strong tonal characteristics. However in contrast to the Sound-Into-Sound work, since the idea is to build this framework for more general use, these characteristics are probably the only spec I base my sample-folder on.

With this context in mind we arrive to the main question; How would I match my incoming audio to its harmonic matches in the corpus? With this I mean that, let’s say I have a synth-pad coming in at 150Hz. How can I match different entries than the closest? So if the nearest neighbour is 150Hz. That’s fine, but if my nearest neighbour is 139 Hz, I’d rather match with entries of 75Hz, 200Hz or 300Hz if there are any.
Ofcourse, when diving into this matter, more questions arise, like what should be the max deviation before looking for harmonic matches instead of the same frequency matches. And, would you design this query-flow in series or in parallel?

I’m curious if anyone already has experience with similar approaches, i’d like to discuss ideas.


The decoration idea is quite cool. Lots of scope to explore there with regards to not only pitch ornamentation but also temporal etc…

Here are some thoughts on your more specific questions:

For this, I think you’ll find Chroma handy as it will return all instances of a pitch class regardless of octave. If you want something that considers other intervals (e.g return a 5th if it can’t find the exact pitch) you can probably build some logic around that. In general though, if you want it to be octave-aware, then Pitch is what you want, if you only care about pitch class, then Chroma is what you want.

For this part you can massage a bit. For example, if you find a match at 140Hz should it ignore it? What I’ve found good success with is to find the nearest match as you normally would, but then compensate the playback to make up for the difference. With pitch this is a matter of, typically, playing back the file faster or slower. For loudness you can do the same by just turning up/down the amplitude.

Additionally I would convert Hz to MIDI pitch so you don’t have to worry about linear/logarithmic differences (e.g. 140Hz to 150Hz is represents a much bigger difference than 3990Hz to 4000Hz whereas MIDI note 1 to 2 is the same difference as 119 to 120).

If you need Hz at the end of your chain, just chuck a mtof at the end.



Indeed what Rodrigo said: try Chroma and see if it works for you. You can split the signal before, with Sines or HPSS, to focus only on the harmonic material. I find this helps a bit for some of my tasks.

As for ‘general framework’ that is the hard bit, so don’t discourage yourself if you get good results on some material and not on other, and come back with more specific questions and examples. @rodrigo.constanzo always has great examples that help everyone understand more than the specifics of the question at hand!

Aha! I was not really convinced that chroma would be the way to go when facing this challenge. Although I’d been looking into it, I did not really experiment with it yet… Reminds me to not overthink everything before trying out. I think the results from the batch-analysis already increasingly match my expectations. Thanks for the tip.

Yeah I found that Sines somewhat polishes the results of the batch processing. Didn’t really dive into HPSS yet, thanks. Sines, however, does not really seem to help me a lot with the live-analysis but since everything is still in a rough state without massaging params too much, I could be concluding wrongly.

Now there’s something wrong in my patch which I just can’t seem to put my finger on;
In the batch process, slices are being duplicated, which result to several problems.
Firstly, I should state that I use concataudiofiles to create the main buffer, which then is converted to mono and straight to a sines buffer: sound → soundmono → soundmonosines
Screenshot 2024-04-04 at 13.16.32

Then, going to slicing, I decide to create slices from the original stereo buffer (sound).
Screenshot 2024-04-04 at 13.20.17

Now, we get to the point where I fail to understand what’s happening. When looking up the number of slicepoints, we can see it’s 2237, which should result in 2238 slices. Then, when printing the dataset, double the amount of expected slices show up with duplicate slices.

Screenshot 2024-04-04 at 13.28.20

As mentioned, this results in weird problems, mostly in playback or when navigating through the 2D representation. When a slice number from the second half or higher (=/>2238) gets triggered, the resulting start- and endframes are both the very last sample of the entire buffer.

I’ve been trying to find solutions for some time now but somehow I feel like the problem lies somewhere within the buffer conversions.


the first 2 slices have exactly the same values - this is not a good sign - it might mean that your logic of iteration is doing analysis twice so what I’d do is to unplug the uzi and do a single list of 2 slices to see if you get 1 or 2 then find where is the double trigger…