Training for real-time NMF (fluid.nmfmatch~)

I run the Match piano example at 30% CPU average, 50% peak, (I’ve just checked, and I use normal single processor SC) so I’m interested to see how you get to a lot more than me on a more efficient machine…

First thing: there is a known bug in SC that running code from help runs 2 instances (most times). Have you copied the code elsewhere?

If yes, let me know your peak and average server values at 44.1k and we’ll take it from there.

(btw, I run all the code of both Max and SC before every release, so it should work - so bug reports are very welcome in case I did not catch something!)

Oh sorry, I thought you were re-running the training aspect of it (hidden away in some patch if i remember correctly). 100% CPU in Activity monitor sounds like its crapping out one core which is weird… Mine gives me about 30% CPU with activity monitor bouncing around a fair bit but not capping.

@jamesbradbury I presume you get that value in Max. The values I gave were in SC as this was what Sam was asking I think.

Yep, Max. Can’t use SC even if I wanted to!

I am getting around 30% average in SC (34% in Activity Monitor), though the peak is at 250-300, constantly. Very strange behavior. This is why I am wondering if this is compiled in Dev or something. I am fairly certain I downloaded Alpha 06.

It was Alpha05, but upgrading to Alpha07 doesn’t change anything.

This is bizarre. Where do you see those spikes, in the bottow left bit of the window, or in the Activity Monitor?

I’ll check this later with a profiler, see if I can figure out what’s happening

Activity monitor is at 34%. The spike is in the peakCPU of the server, which on my machine is always waaaaaaaaay more than the avgCPU. This has been an issue with SC for a while, but not usually this bad. With 2000 SinOsc, I have an avgCPU of about 35 and a peakCPU hovering around 80 (with occasional peaks over 100).

Strange results. Here with
{Mix.fill(2000,{SinOsc.ar(200.0.exprand(400),mul:0.0005)})}.play
I get 28% avg and 28-34% peak. My machine is older than yours, and less powerful. Activity monitor gives me 25-28%
I’m on vanilla SC 3.10.0, opened in low resolution, if that changes anything (it shouldn’t but hey!)

With that code, my average is 20. Peak hovers around 34. The 2016 mbp was no faster than the 2012. Sometimes slower.

But I get the same performance issue with FluidNMFMatch with my 2012 mbp, so this is not the problem. I am on the current 3.10.2 build.

Good news everybody :grimacing: I get the same performance as Sam when I run that example in SC. Will hunt it down. Like a dog.

1 Like

@spluta is your hardware driver block size 64 by any chance? This is having a pretty pronounced effect on the numbers I see for that example.

Meanwhile, I’ll see if I can narrow down what makes the CPU usage so spiky

Yes. I always run at 64. I can only get fully glitch free operation at 512. This is with the internal mac hardware, not my RME, which I don’t have right now.

It would be good to share your server.options both, since I cannot reproduce (except when I run from the helpfile, as discussed, which runs 2 instances…)

so you get 28% better average than me, but same peak. I love processor designs, they are hard to follow :wink:

ok with
s.options.hardwareBufferSize = 64
I get the same as you (18-23% and 29-37 peak for the 2000 SinOsc). I also get similar crap sounding in the piano example… but the Mac version does the same sound at that i/o size! For it not to crap, it needs to be up to 512 for that specific example.

Does that make sense?

Now this is OT, kind of, but the hardwareBufferSize thing is probably a big deal for usability:

FluidHPSS, running at 512: 6% and 8% on the CPU meters
Fluid HPSS, running at 64: 5% and 50% on the CPU meters!!!

This seems extreme to me. It goes from a nice process that could be running in the background to needing its own cpu core.

2000 sines are 18% and 21% at 512 vs 16% and 30% at 64. The numbers are way less stable at 64.

It is not OT actually: understanding the problem of 64 as hardware i/o is quite important. I’m sure @weefuzzy and @a.harker will give you better explanation than me why you actually see a spike (the max) but the process is fundamentally the same: you have an fft and a (heavy) process that needs to happen at discrete time. And when you hardware i/o buffer size is small, you have very little time for that peak to happen. It is true for all processes that are frame based… the unlucky segment that gets a large process to be done will be struggling, hence the peak.

In Max, in the native Pfft, there is a way to offset the fft transforms between instances, to spread the load of when these peaks are happening. Again, @weefuzzy and @a.harker might have ideas on how to do that for our set of tools, to spread the load.

If any of this is unclear, let me know, I’ll try to explain better.

It is totally clear, and obviously an SC FFT issue. Try this with both 64 and 512 buffer sizes:

(
{
var in, chain;
in = WhiteNoise.ar(0.1);
chain = FFT(LocalBuf(2048), in);
chain = FFT(LocalBuf(2048), in);
chain = FFT(LocalBuf(2048), in);
chain = FFT(LocalBuf(2048), in);
chain = FFT(LocalBuf(2048), in);
chain = FFT(LocalBuf(2048), in);
chain = FFT(LocalBuf(2048), in);
chain = FFT(LocalBuf(2048), in);
chain = FFT(LocalBuf(2048), in);
chain = FFT(LocalBuf(2048), in);
chain = FFT(LocalBuf(2048), in);
//IFFT(chain) // inverse FFT
}.play;
)

Or just believe me that the difference is 1% or 20%.

The reason it confused me is that I thought that SC FFT processes ran at their own rate that is based on their window size. Clearly not true…but it should be.