FluidBufSelect and FluidDataSet

areacode · December 2, 2024, 6:26pm

Hi again -

I’m in the process of working through the batch-slicing example and the 2D Corpus Explorer demo.

In the former, one starts with a single analysis from FluidBufSpectralShape (the centroid), uses FluidBufStats to derive a mean, and then uses FluidBufSelect. This final step is a still a little confusing to me.

Correct me if I’m wrong, but it seems that regardless of any specified buffer size and number of channels, using FluidBufSpectralShape and FluidBufStats produces a 7-channel analysis. This leaves me a little confused about the use of “selection” and “numChannels” arguments in these objects.

Where it gets a little more confusing for me is when bringing in the FluidBufSelect. Adjusting FluidBufSelect to select all of the 7 incoming channels still produces a 1-dimensional dataset, with the code exampled modified below.

I suspect the issue I’m having might have something to do with the FluidBufFlatten stage that gets used in the Corpus Explorer example, but I’m a little uncertain about this for a variety of reasons.
Any help would be appreciated!

//produces 1-dimensional dataset. 

            FluidBufSpectralShape.processBlocking(s,
					source:src,
					features:sliceAnalysis,   
					startFrame:start,
					numFrames:numSamps,
					numChans: -1,
					action:{
						FluidBufStats.processBlocking(s,
							source:sliceAnalysis,
							stats:sliceStats,
							numChans:-1,
							action:{
								FluidBufSelect.processBlocking(s,
									source:sliceStats,
									destination:sliceMean,
									indices: 0,
									channels: -1,
									action:{
								data.addPoint(sliceIndex, sliceMean,{
								cond.unhang
                     });
								});

weefuzzy · December 2, 2024, 10:04pm

SpectralShape → Stats will produce a 7 channel buffer (7 spectral descriptors) but also with 7 frames (7 statistics). So FluidBufSelect is just taking the 0th frame (which is the mean): the default is to take all channels, so you end up with 7 channels and 1 frame.

(You can actually get away without FluidBufSelect now because we added a select argument to FluidBufStats (and other objects) so you can cherry pick want you want directly.)

What FluidBufFlatten does is turn your 7 channel, 1 frame buffer into a 1 channel 7 frame buffer: FluidDataSet::addPoint will simply ignore extra channels, which is why you end up with a 1-D data set if you don’t use it.

areacode · December 2, 2024, 11:23pm

Thanks - so, we get 7 channels (spectral descriptors) and 7 frames (statistic functions), and send them to FluidBufSelect. If I wanted, say, the spectral centroid mean and the kurtosis standard deviation - I would think I could cherry-pick in FluidBufStats by something like the following, but clearly I’m misunderstanding something.

Alternately, if FluidBufSelect allows to pick from 7 buffers/7 frames, I would be interested in how to format that message also…

 FluidBufSpectralShape.processBlocking(s,
					source:src,
					features:sliceAnalysis,
					startFrame:start,
					numFrames:numSamps,
					select: [\centroid, \kurtosis],
					action:{
                   FluidBufStats.processBlocking(s,
							source:sliceAnalysis,
							stats:sliceStats,
							numChans:7,
							select: [\mean, \std],
							action:{
 
								data.addPoint(sliceIndex, sliceMean,{
								cond.unhang
						
                        });
                    });
                });
                cond.hang;
            });
        });

weefuzzy · December 2, 2024, 11:52pm

That will give you both the mean and stddev for the centroid and kurtosis, in a 2 channel, 2 frame buffer. I think the simplest way to get just the centroid mean and the kurtosis stddev would be to take what you have and then flatten → select before the addPoint. Because the flattened buffer is only a single channel, it’s easy just to cherry pick the samples you want (0 for the centroid mean, 3 for the kurtosis stddev).

Untested, but for illustration

FluidBufSpectralShape.processBlocking(s,
	source:src,
	features:sliceAnalysis,
	startFrame:start,
	numFrames:numSamps,
	select: [\centroid, \kurtosis],
	action:{
		FluidBufStats.processBlocking(s,
			source:sliceAnalysis,
			stats:sliceStats,
			numChans:7,
			select: [\mean, \std],
			action:{				
				FluidBufFlatten.processBlocking(s, 
					source:sliceStats, 
					destination: flatStats, 
					action: {
						FluidBufSelect.processBlocking(s,
							source:flatStats, 
							destination: finalFeatures, 
							indices: [0, 3], 
							action: {
								data.addPoint(sliceIndex, sliceMean,{
									cond.unhang
								});	
						}); 
...

areacode · December 3, 2024, 12:20am

I adjusted the data.addPoint so that it reads:
data.addPoint(sliceIndex, finalFeatures)
Which I think is what is intended there.

But I then got this error:
ERROR: FluidBufStats - Start channel 0 out of range.

Removing the lines that specify selection (select: [\centroid, \kurtosis] &
select: [\mean, \std]) print the 2 specified columns in the dataset, since the [0, 3] indices probably only apply when receiving 7 frames. However, changing the indices to -1 and restoring the “selection” produces the same error as above.

One thing worth clarifying, also: does filtering this information impact the performance speed in a meaningful way or is it just as effective to filter the dictionary dataset?

weefuzzy · December 3, 2024, 1:05am

It’s numChans:7 in your call to FluidBufStats that’s causing that problem (but the error message is incorrect, which is a bug). The input to stats from spectralshape now only has two channels. So if you change that to numChans:-1 (which means ‘all’) or leave it out entirely, stuff should work.

This runs ok for me:

s.boot
(
fork{
	var src = Buffer.read(s,FluidFilesPath("Nicol-LoopE-M.wav"));
	var sliceAnalysis = Buffer(s); 
	var sliceStats = Buffer(s); 
	var flatStats = Buffer(s); 
	var finalFeatures = Buffer(s);
	var data = FluidDataSet(s); 
	s.sync; 
	
	FluidBufSpectralShape.processBlocking(s,
		source:src,
		features:sliceAnalysis,
		select: [\centroid, \kurtosis],
		action:{
			FluidBufStats.processBlocking(s,
				source:sliceAnalysis,
				stats:sliceStats,
				numChans:-1,
				select: [\mean, \std],
				action:{
					
					FluidBufFlatten.processBlocking(s, 
						source:sliceStats, 
						destination: flatStats, 
						action: {
							FluidBufSelect.processBlocking(s,
								source:flatStats, 
								destination: finalFeatures, 
								indices: [0, 3], 
								action: {
									data.addPoint(0, finalFeatures,{
										"point!".postln; 
									});	
							}); 
					}); 
					
			});
	});
}
)

weefuzzy · December 3, 2024, 1:07am

Not meaningfully. Doing it up front like this should – in principle – help have simpler code. Using DataSetQuery isn’t lightning fast, but I don’t think you’d notice a difference except with larger datasets.

weefuzzy · December 3, 2024, 1:32am

Whilst we’re about it, every now and again sadness about the amount of nesting that these processing pipelines involve spurs me to try and imagine some nicer way for folk to code these.

As an SC person, what’s your take on this more declarative style below? As it stands, it’s more faff because each Fluid call has to be wrapped in a function (partial application in SC doesn’t work as well as I’d hoped), but for me it still reads more cleanly. Ideally, it would be nice to be able to specify a processing pipeline where you only had to supply a few variables for each process, and it would take care of buffer management, waiting for results and all that stuff.

(
// stand-in for some proper pipeline machinery that lives in my imagination
~pipeline = {|funcs|	
	var c = Condition.new; 
	var a = {
		c.unhang;
		"done".postln; 
	};
	fork{
		funcs.do{|f, i|
			"Process % of %...".format(i + 1,funcs.size).post;
			f.value(a);
			c.hang;
		};
		"pipeline done".postln; 
	};
};

// does this look / feel better in this more declarative style? 
fork{
	var src = Buffer.read(s,FluidFilesPath("Nicol-LoopE-M.wav"));
	var sliceAnalysis = Buffer(s); 
	var sliceStats = Buffer(s); 
	var flatStats = Buffer(s); 
	var finalFeatures = Buffer(s);
	var data = FluidDataSet(s); 
	s.sync; 
	
	~specshape = {|a|		
		FluidBufSpectralShape.processBlocking(s,
			source:src,
			features:sliceAnalysis,
			select: [\centroid, \kurtosis],
			action:a); 
	}; 
	
	~stats = {|a|
		FluidBufStats.processBlocking(s,
			source:sliceAnalysis,
			stats:sliceStats,
			select: [\mean, \std],
			action:a); 
	}; 
	
	~flatten = {|a|
		FluidBufFlatten.processBlocking(s, 
			source:sliceStats, 
			destination: flatStats, 
			action:a); 
	}; 
	
	~sel = {|a|
		FluidBufSelect.processBlocking(s,
			source:flatStats, 
			destination: finalFeatures, 
			indices: [0, 2], 
			action:a); 
	}; 
	
	~addpoint = {|a|
		data.addPoint(0, finalFeatures,{
			data.print; 
			a.value;			
		});	
	}; 
	
	~pipeline.value([
		~specshape, 
		~stats, 
		~flatten, 
		~sel, 
		~addpoint
	]); 
}
)

areacode · December 3, 2024, 1:58am

I think I generally understand what’s going on at this point, so thanks for helping me through this… .There’s just this line:

indices: [0, 3],

Would I not want indices [0, 1]?
…and maybe more importantly, does the fact that I’m still getting a value for indices number 3 mean that FluidBufStats is passing all 7 frames anyway?

areacode · December 3, 2024, 2:28am

Re: clarified functions

I’m probably not the best person to weigh in on this, since I tend to be pretty idiosyncratic (sloppy) with my code’s appearance and I’m still getting a sense for how to use FluCoMa.

Having also worked with Max/MSP, it strikes me that scheduling/threading/order of events are a unique challenge in SuperCollider. I tend to get suspicious that a dozen nested actions won’t always work as expected, especially as I’ve always run into problems with that stuff outside of FluCoMa… (Using “defer” with updating GUIs can still lead one afoul…and buffer modifying processes often get a little muddy with server v. client navigation, as well)…

I can see how putting the analyses in discrete functions could help with this. Instead of having each analysis hard-coded to be triggered by the previous one, you would have the a little more modularity with which things are feeding where, in whichever order…

As a total amateur, which I can’t stress enough - I also feel like this pipeline approach might make it easier to apply some kind of optional, automatic hierarchy for buffer management. Each stage seemingly has an input and and output buffer, often with a technically descriptive camelCased name…having them all visible on one line would probably help legibility on that front.

On the other hand, I’m (hopefully) quickly adapting to the way FluCoMa works now and I don’t know if I would say this is a problem. I’m in the process of trying to get enough information so I can debug at each stage, to confirm for myself that things are working properly.

areacode · December 3, 2024, 7:41am

Oh wait - I think I understand now.
Is it that FluidBufStats is actually passing four “indices”?
0 and 1 are the mean and standard deviation for channel 0 (centroid).
2 and 3 are the mean and standard deviation for channel 1 (kurtosis).

weefuzzy · December 3, 2024, 10:04am

No, because I flattened it first. There are four values in the buffer after FluidBufStats. Before flattening they are laid out like

	Index 0	Index 1
Channel 0	Centroid Mean	Centriod Stddev
Channel 1	Kurtosis Mean	Kurtosis Stddev

After flattening they are laid out in a single channel buffer:

	Index 0	Index 1	Index 2	Index 3
Channel 0	Centroid Mean	Centriod Stddev	Kurtosis Mean	Kurtosis Stddev