Basically the hpss algorithm is way too “long” for my tastes, and ends up sounding like a weird EQ/filter on the sound, and then transients~ returns a microscopic blip type sound.
I can see the benefit of only the tinniest fragment of sound being classed as a “transient”, but the musical usage of this is limited (it doesn’t respond to processing quite as well as a longer slice of time).
At the moment there is no transient extractor that applies to what I would consider to be a perceptual transient (closer to an “onset” perhaps?).
Would it be possible to have a parameter or control that lets you specific the size of what is returned as a transient? Or if it doesn’t work in a temporal way, a thresh-type control that lets you have it grab long periods of time.
I don’t hear anything I’d describer as crosstalk here - can you record it or be more specific?
A transient isn’t a slice of time - it is the residual from a process of estimating the waveform in a detached area that looks transient like - you can’t directly control the length of that.
Here’s a bit of brushes recorded directly from the patch.
Bits of it sound “crunchy” to me, in a way that I associated with lack of debouncing, but it could just be that’s what the process sounds like.
I imagine that it’s not directly a slice of time, but having control over the parameters which can control the length of the output would be useful. So one could tune it in a manner to get (potentially) longer chunks of time.
Sorry - switched threads by accident. This is what I’d expect the algorithm to sound like - there’s isn’t an obvious debounce issue here (that would look different in the waveform) but obviously they are short. I personally think the way you’ve set the parameters with the skew at -10 is possibly part of that.
Isn’t another approach to transientslice~ and then take some audio after that has occured? You can deal with the latency math with however that is implemented, but that gives you what you want no?
As in, reworking the code to do so, or by manipulating the existing settings?
If the former, would it be possible to put that in? It seems like it would be a real useful parameter (unless it was played with and turned out to be useless)
That’s essentially what I’ve done in the comparison patch (crosspasted below), but it doesn’t sound quite the same.
This just has a windowed/grain sound, instead of a more transient sound. It might also have to do with the fact that the audio coming through is full range, instead of limited to the transient.
The problem is what you think the algo is doing… to make it simple: it is extracting the transient, the discontinuity. It is not a slicer, it is a layerer- it removes the concept of the transient (a sharp discontinuity in the signal) and resynthesis what should have been there under it.
@weefuzzy should have soon a clearer explanation than this, stay tuned!
Yeah I kind of get that. I’m more pointed out the fact that none of the slicers do what I would expect them to do.
It’s been a while since I play with it, and the syntax has changed a ton since, but I remember @a.harker showing me a median filter with early framelib that sounded more like what I would expect from this kind of process. It could have been a manner of how the parameters were set (or even exposed), but from memory it wasn’t nearly as “blip”-y.
I think we’re getting our terminologies crossed, insofar as we’ve taken to using ‘slicers’ as those objects that just tell you where something happened, rather than trying to extract something like an onset. (Maybe that’s the problem – that our taxonomy doesn’t make any sense? )
There is a FrameLib demo called 7 - Spectral Median Filter and another 7b - Spectral Median Filter, both in the demos folder of my FrameLib. The latter, I believe, should be the same as our HPSS in mode 0.
The former is similar to just using the percussive filter part of HPSS (and with something more like coupled threshold mode). Something in the same ballpark as that patch: fluid.hpss~ @harmfiltersize 3 @percfiltersize 43 @maskingmode 1 @harmthresh 0 3 1 3 @fftsettings 4096 1024 -1
Ah right, I see that. I just had some early demo slides/patches somewhere that I was certain wouldn’t work anymore.
In testing those patches out, yeah they definitely sound like HPSS.
And the terminology does get a bit confusing too (even you said “extract an onset”, which makes sense to me, but is not what is happening in the object, or audibly).
So, technically speaking, a transient is a layer that is extracted from the sound, which is just a discontinuity. As such it has no “length” other than however much of the sound had that transient peak in. (And in effect almost sounds like an EQ-d click~)
An onset, which is not really accounted for anywhere (other than the fabled onset/offset in fluid.ampslice~) other than primarily to denote when one is starting. There is no function to extract an “onset” as a unit of audio (either in a temporal or spectral domain), is that correct?
So if all of that is correct, does it then follow that what I’m after (a transient-like extraction, but that is longer as to be identifiable) doesn’t exist?
Yeah, it’s muddy everywhere. The way we tend to think of it is
an onset is the beginning portion of some event, and that may or may not contain something that sounds like a transient.
a transient is some very short thing that could be part of the onset of a sound, or a sound in its own right, or something like a texture could be heard as many transients.
We’re treating transients as a type of layer in so far as they’re a sort of proto-sound that can be juxtaposed vis-a-vis sinusoids and shaped noise. This particular algorithm does happen to model them as discontinuities, because its a declicking algorithm (I’ll have a bash at explaining how it works tomorrow, and seek some feedback on whether the explanation makes sense). This does have the consequence that it does indeed only model these shaped, and very slightly extended clicks. I don’t know enough about its innards yet to know if it could be induced to behave differently.
I don’t know of a canonical algorithm but that doesn’t mean that there isn’t one. I suspect it would be difficult to generalise, partly because there’s such a range of behaviour between different sounds, and partly because there isn’t really a definitive point where something ceases being the onset and starts being the next-bit. Then things get more difficult again when you’re dealing with mixtures of sound.
I think if one knows the range of sounds one is trying to deal with, it gets easier because you can make some assumptions (rather than trying to deal with snare + stick and violin + bow with the same model). For instance, someone has tried using NMF to discover the ADSR parts of a sound:
To do that, I guess, you’d seed NMF with some initial estimates for the activations, i.e. where in time the bits might be, perhaps allowing cross fades between them. You probably wouldn’t seed any dictionaries, unless you had some prior idea of what the spectra for the different zones might be. Then set it going, allowing it to update your seeded activations. However, you’d want to be reasonably sure that you were running it on a single isolated event for the results to make sense (for which you could use a first round of NMF to try and separate the incoming audio into events, using pre-seeded dictionaries?).
Ok, that’s as I’ve come to understand it, and it makes sense.
I remember mention of this. So I guess that’s why the scale of time is so microscopic, because it’s intended to remove these discontinuities in a manner that leaves the original audio largely unaffected (uneffected?!). Like a use case for me would be to apply some dsp to just the transient, and other than some time-based effects (or general ‘transient shaper’ dynamics stuff), these extracted transients don’t seem to respond to much (i.e. distortions, filters, etc…). Unless I’m paying really close attention I can’t even tell they have been removed…
This is a bit OT, but there’s been a bit of mention in a couple of threads now about where the generalizability of some of the algorithms break down and/or how often “real world” applications do some kind of contextually appropriate pre-seeding and/or algorithm tweaking. Is that level of specialization and/or fine-tuning in the plans/cards for future FluCoMa stuff?
Like, architecturally speaking, some of the stuff discussed in the thread around realtime NMF matching (and semi-reworking the concept from the machine learning drum trigger thing) just isn’t possible in the fluid.verse~ because there is no step that allows a pre-training/pre-seeding of an expected material to then optimize towards.
It’s great that all the algorithms work on a variety of materials, but form a user point of view, if they don’t do a specific type of material well, the overall range doesn’t really help or matter.
It’s very much on the cards, because one of our starting convictions is that it’s that’s sort of tweaking and tuning that needs to be available, where the algorithm affords it. When @tremblap talks about getting the ‘granularity’ right, this is what he’s getting at. By and large the starting approach has been to expose everything and then struggle with the interface problems that arise One of the things we need to find out (hence involving people in the project) is how the tuning etc. pans out in practice, so we can improve on interfaces, and find a hopefully optimal blend of flexibility and usability. I realise we’re not there yet…
All this becomes a more acute concern for the next toolbox, because there will be more of these more abstract algorithms like NMF that get trained and tuned, and with that, great scope for making things bewildering.
Like, architecturally speaking, some of the stuff discussed in the thread around realtime NMF matching (and semi-reworking the concept from the machine learning drum trigger thing) just isn’t possible in the fluid.verse~ because there is no step that allows a pre-training/pre-seeding of an expected material to then optimize towards.
Not quite I sure follow: pre-training and seeding is possible with NMF via the actmode and basesmode attributes? Perhaps the docs need to make more of this, but that gives you a range of ways of steering it in a supervised or semi-supervised way.
As before, we still don’t know the extent of creative possibilities here, because there hasn’t been much creative work with this stuff before. As an algorithm, it definitely has its quirks, and seeing which of those get in people’s way will be helpful in narrowing down what might be the most helpful extensions to NMF to add (there are loads).