Transcription Use Case

Is anyone using FluComa to automate the transcription of melodies?

I built a real-time pitch tracker system which uses SuperCollider’s Tartini.kr. It’s very rough but you can look at it here:

I’m using mainly monophonic solo saxophone music as source material. I’ve found it interesting to explore some of the ambiguities of clean pitch tracking. For example, sometimes the pitch is only discernible after the onset of the note, so there are difficulties relating to timing.

I would like to use Flucoma’s Buffer based tools for offline processing.
I’ve started playing a bit with FluidBufPitch but I’m having a similar problem. I’m not sure if the pitch features the UGen is extracting are co-incident with note onsets. Are there any examples of using FluidBufPitch and FluidBufOnsetSlice together?

Eventually, I would like to begin to examine at my all sources together as a corpus. Some of the things that I’m interested in doing: isolating solos from backgrounds and accompaniment and classifying melodic shapes.

Any ideas are welcome.

FluidBufPitch should correct for the latency of the processing itself (due to the STFT buffering), although is possible that we’re not getting the latency report quite right.

Most likely though is that the pitch tracker (any pitch tracker, I would guess) will take some amount of time to decide when something has changed, but I would expect (hope) that the pitch confidence output would be correspondingly low whilst that’s happening.

@tremblap has certainly spent more time than I trying to clean up tracked pitch, but some things that might be helpful in post-processing are

  • Going segment by segment, as you suggest (which is ok if the segments are ok…)
  • Using FluidBufStats on segments to get a aggregated estimate for that chunk (e.g. using the median)
  • Using FluidBufThresh to gate out moments with low confidence, which IIRC, can then be used as a way to pass weightings to FluidBufStats.

I think in the examples folder there’s a (slightly involved) example showing some of this at Examples/dataset/1-learning examples/10b-weighted-pitch-comparison.scd

I’m going to spend some time looking at the examples.

I was looking at the confidence measures originally but was having trouble when converting them to MIDI notes. I realize now that I was filtering the results incorrectly.

Thanks for the suggestions.

Hello

Example 10b is quite advanced indeed, explaining all the problems I had to go around to try to get a good pitch estimate.

the object has a midi.cents output option already, @unit is your friend there - but the issue of segmenting and confidence and loudness are quite important and are explicitely dealt with in 10b.

I’m happy to help if further questions arise. It is often very material-related, as you will discover…

Given a FluidBufOnsetSlice and a FluidBufPitch analysis buffer, how would I act segment by segment on that?

Basically once I filter out the items analyzed by FluidBufPitch with low confidence, the tracking is good. My source material is relatively clear. However, what I’m getting now are many separate values for slight pitch variations with vibrato.

What I really want is just a list of MIDI notes, rounded and de-deduplicated and assuming a relatively course quantization.

I’m finding this a little hard to parse if you would be able to clarify :slight_smile: I think that I understand what you are trying to say though but please correct me. Is it the case that you have some segments, and companion pitch analysis but you want to produce a summary for each segment of what the pitch was? This is definitely within the remit of what the tools can offer you but it’s worth diving in at a speed that suits you and will be fruitful in your learning process.

Yes, that’s correct.

Right now I have a relatively clean stereo sound file which I sum to mono. I perform a FluidBufOnsetSlice which returns a Buffer of onsets. I now have a buffer where each buffer frame contains the corresponding sample of each note onset from the original sound file buffer. For my purposes they are accurate enough that I can trust them (they correspond to individual notes).

For each of these onsets I would like to use FluidBufPitch to detect the average pitch. In this situation, the end result that I want is a reduced Array of MIDI notes. I want a single MIDI note for each discrete onset.

Right now I have two analysis buffers which I would like to compare (as well as the original sound file buffer). The one returned by FluidBufOnsetSlice and another returned by FluidBufPitch.

So what’s confusing to me:

  • How do the frame values returns by FluidBufPitch relate to timing? I just have a continuous stream of values (with the confidence value interleaved). This makes it difficult to compare the two buffers.
  • What is the most idiomatic way to do the comparison using the toolkit in SuperCollider? Do I need to construct some kind of for loop (for each onset range returned by FluidBufOnsetSlice, perform a FluidBufPitch analysis and then get FluidBufStats). My concern is that the blocking nature of the operation would be cumbersome and inelegant.
  • Are there examples of this particular workflow?

The examples Examples/dataset/1-learning examples/10b-weighted-pitch-comparison.scd are illuminating but more illustrative of a situation where deciding the pitch is ambiguous.

Thanks.

The Buffer returned by FluidBufPitch (or any of the other buffer based descriptors) has a sample rate of [source sample rate]/[analysis hop size]. The hop size will be 512 if you’ve not changed your FFT settings for BufPitch (it defaults to the window size / 2, and the default window size is 1024). So each frame represents a 512 sample analysis chunk.

The interleaving is what SC does when you load a multichannel buffer to a float array. On the server the pitch estimates are on channel 1, and the confidences on channel 2. You can delace them language side if you need.

There’s a few ways of attacking this. The trick is to try and minimise the amount of time spent synchronising with the server, as this slows things down markedly. In the below I’ve tried for something relatively legible (i.e. not lots of nested action callbacks that stays on the server as much as possible). Is that the sort of thing you were after?

(
fork{  
    //Set up some variables
    var meanPitches = FluidDataSet.new(s);//we'll keep our data in here while we process
    var onsetsBuffer = Buffer.new; //this will be our onset positions
    
    //This function does the main analysis. 
    //Putting it up here to try and reduce clutter
    var meanPitchForSlice = {|start,end| 
        var pitches = Buffer.new; 
        var stats = Buffer.new;         
        //we'll use this to filter lower confidence pitches
        var confidences = Buffer.new; 
        
        if(start >= end) {"Invalid indices: %, %".format(start,end).throw}{}; //just in case
        
        //By using processBlocking here, life is made simpler: everything happens in order
        //on the server command queue
        FluidBufPitch.processBlocking(s,b,start,end - start,features:pitches,unit:1);         
        //copy the pitch confidence buffer
        FluidBufSelect.processBlocking(s,pitches,confidences,channels:0); 
        //use it as the weights for our stats 
        FluidBufStats.processBlocking(s,pitches,numChans:1, stats:stats,weights:confidences); 
        
    
//put the stats into a data set, using the start time as a lookup key for later
        meanPitches.setPoint(start,stats); 
        meanPitches.size(action:{|s| "Slice %".format(s).postln}); 
        s.sync; //wait for stuff to finish 
        //then tidy up
        pitches.free; 
        stats.free; 
        confidences.free; 
    };
    
    //Main action: first get the onsets, and wait 
    FluidBufOnsetSlice.process(s,b,indices:onsetsBuffer, metric:5,threshold:0.15).wait; 

    //Get onsets as array, then iterate over in pairs of [start,end], calling the fn above
    onsetsBuffer.loadToFloatArray(action:{ |a|
        var onsets = Array.newFrom(a) ++ (b.numFrames - 1); 
        var slices = onsets.slide(2).clump(2); //rearrange to [start,end]
        "Processing % slices".format(slices.size).postln; 
        slices.do{|range|
            meanPitchForSlice.value(range[0],range[1]); 
        };             
    });
    
    // We've filled a FluidDataSet with stuff, now dump it as a Dictionary 
    meanPitches.dump{|dict|
        //Dictionaries are unordered, and the keys here are Strings. 
        //Collect the keys as a sorted array of floats 
        var data = dict["data"]; //actual goodies in sub-dictionary called 'data' 
        var keys = Array.newFrom(data.keys).collect{|k|k.asFloat}.sort; 
        
        keys.do{|startTime|
            //The mean pitch is the 0th stat, grab it and rounr
            var meanPitch = data[startTime.asString][0].round; 
            "%: %".format(startTime,meanPitch).postln; 
        }        
    };
    s.sync; 
}
)
1 Like

Yes, that’s helpful, especially the processBlocking.

I’m trying to modify it in a similar way to this post but I’m having difficulty.

a = { |b| // parameterize soundFile buffer
  //Set up some variables
  var meanPitches = FluidDataSet.new(s); //we'll keep our data in here while we process
  ...

  var keys; // declare output variable up top
  var cond = Condition.new;
  ...

    keys.do{|startTime|
      //The mean pitch is the 0th stat, grab it and round
      var meanPitch = data[startTime.asString][0].round;
      "%: %".format(startTime,meanPitch).postln;
    }
  };
  cond.hang;
  keys // or keys.yield
};

fork{ ~result = a.value(b) }
~result; // -> a Routine

Expected output is an Array.

What did you want in the array ? Just a list of the quantized pitches?

You shouldn’t need to add any new Conditions – the server syncs should be controlling flow properly.

(
~analyze = { |buf|
    //Set up some variables
    var meanPitches = FluidDataSet.new(s);//we'll keep our data in here while we process
    var onsetsBuffer = Buffer.new; //this will be our onset positions
    var meanPitchesArray = []; 
    //This function does the main analysis. Putting it up here to try and reduce clutter
    var meanPitchForSlice = {|start,end| 
        var pitches = Buffer.new; 
        var stats = Buffer.new;         
        var confidences = Buffer.new; //we'll use this to filter lower confidence pitches
        
        if(start >= end) {"Invalid indices: %, %".format(start,end).throw}{}; //just in case
        
        //By using processBlocking here, life is made simpler: everything happens in order
        //on the server command queue
        FluidBufPitch.processBlocking(s,buf,start,end - start,features:pitches,unit:1); 
        
        //copy the pitch confidence buffer
        FluidBufSelect.processBlocking(s,pitches,confidences,channels:0); 
        //use it as the weights for our stats 
        FluidBufStats.processBlocking(s,pitches,numChans:1, stats:stats,weights:confidences); 
        
        //put the stats into a data set, using the start time as a lookup key for later
        meanPitches.setPoint(start,stats); 
        meanPitches.size(action:{|s| "Slice %".format(s).postln}); 
        s.sync; //wait for stuff to finish 
        //then tidy up
        pitches.free; 
        stats.free; 
        confidences.free; 
    };
    
    //Main action: first get the onsets, and wait 
    FluidBufOnsetSlice.process(s,buf,indices:onsetsBuffer, metric:5,threshold:0.15).wait; 

    //Get onsets as array, then iterate over in pairs of [start,end], calling the fn above
    onsetsBuffer.loadToFloatArray(action:{ |a|
        var onsets = Array.newFrom(a) ++ (buf.numFrames - 1); 
        var slices = onsets.slide(2).clump(2); //rearrange to [start,end]
        "Processing % slices".format(slices.size).postln; 
        slices.do{|range|
            meanPitchForSlice.value(range[0],range[1]); 
        };             
    });
    
    // We've filled a FluidDataSet with stuff, now dump it as a Dictionary 
    meanPitches.dump{|dict|
        //Dictionaries are unordered, and the keys here are Strings. 
        //Collect the keys as a sorted array of floats 
        var data = dict["data"]; //actual goodies in sub-dictionary called 'data' 
        var keys = Array.newFrom(data.keys).collect{|k|k.asFloat}.sort; 
        
       keys.do{|startTime|
            //The mean pitch is the 0th stat, grab it and rounr
            var meanPitch = data[startTime.asString][0].round; 
            // "%: %".format(startTime,meanPitch).postln; 
            meanPitchesArray = meanPitchesArray.add(meanPitch); 
        }      
        
    };
    s.sync; 
    "Analysis complete".postln; 
    meanPitchesArray 
}
)
//obviously you'll need your own path here...
~audio = "/Users/owen/Documents/Max 8/Packages/flucoma-max/media/Tremblay-AaS-AcBassGuit-Melo-M.wav"; 
b = Buffer.read(s,~audio) ; 
~results = []; 
fork { ~results = ~analyze.value(b); } 
~results

I must have made a mistake somewhere. This works perfectly. Thanks.