Automated pitched/noise segmentation and organisation

UnUnUnium · September 26, 2023, 6:43pm

Hello. I’m very new to FluComa, but not to Max, where I would be implementing my idea.

I want to be able to take an urban field recording, containing automotive noise, voices, music from shops and cars, and segment it into single-note, pitched samples, as well as well as noise sounds with no tonal centre.

So for example, it would find pitched sounds in a voice, and cut the length so that only a single, stable fundamental pitch was present in each segment - in other words, probably stable syllables, no diphthongs.

Then, it would make a chromatic scale with all the pitched material by starting with the lowest pitched segment and pitching (if necessary) other segments as much as necessary to fit a chromatic scale.

I am thinking of an automated workflow something like this:

Segment and separate pitched sections and noise sections of recording.
Find longest possible single-pitch segments amongst the pitched sections.
Categorise segments according to given pitch
Change pitch of segments to fit scale

There would be some decisions to make, for example if there were many segments of the same or very similar pitch, but the idea is essentially that you would end up with a pitched, multi-sampled instrument that could be played chromatically, extracted from an arbitrary field recording.

Is something like this possible in FluComa? Any leads appreciated!

rodrigo.constanzo · September 26, 2023, 7:06pm

Howdie!

Definitely possible, with varying amounts of complication along the way depending on how far you want to push each step.

A good jumping off point might be this series from @brookt:

Not exactly the same thing you’re wanting to do, but does go over a lot of the same beats (segmentation, sorting by various descriptors, etc…).

Beyond that you could then build/refine something to make decisions on what you segment (e.g. if there are multiple options, select one randomly or the closest to the correct pitch, etc…, if there isn’t a match for a given note do you transpose or timestretch/correct?, if there’s no exact pitch but you have the same pitch from a different octave is that suitable? (chroma descriptor), etc…)

You can also really refine the segmentation too and potentially include signal decomposition (HPSS or NMF to extract pitched material from noisy material (if desirable to do so). I’ve had good results in the past “pre-decomposing” material using one (or a combination of) those algorithms and then looking for segments within that separately.

Hope that offers some jumping off points!

UnUnUnium · September 26, 2023, 7:43pm

Hello Rodrigo,

Thank you for your reply! Yes, this is exactly what I needed to know.

Indeed, the idea is as you say - if the pitch doesn’t match, a simple transpose by the ratio closest to that interval.

I think I can work out the maths of that part reasonably well once I know the actual pitches and the desired pitches - it’s the separation into noise/pitched and the segmenting into single pitched elements which I need to put the most work into understanding.

Actually though, looking back later in the video series you sent, it looks like he is getting very close to this effect, so it seems it might be easier to do than I think!

I will post again when I get somewhere. Thanks again for the head start!

tremblap · September 27, 2023, 7:51am

there was a thread exactly about this here a year ago. Try searching and if not successful I’ll try too - i have to run now.

UnUnUnium · September 27, 2023, 10:13am

Thank you! I found this, pertaining to the separation of consonants and vowels vowels/consonants

The part I’m more confused about is how to get it to only choose the length of a pitched sound while it is stable - in other words staying at the same fundamental pitch.

If you find anything let me know - in the mean time I will be watching FluComa tutorials!