Hybrid/Layered resynthesis (3-way)

rodrigo.constanzo · March 23, 2020, 7:44pm

Ok, so finally got to building some of this since I’ve had a bunch of time on my hands recently.

The idea is to build on the onset descriptors idea which I’ve posted elsewhere on the forum to analyze an incoming bit of audio (generally a percussion/drum attack) and then create a hybrid/layered resynthesis of it where the transient is quickly analyzed and replaced with an extracted corpus transient, while then analyzing a mid-term window of time, crossfading to an appropriate sound/layer, and finally a long-term fadeout, which again, would hopefully be hybrid.

So the generally idea is something like @a.harker’s multiconvolve~ where multiple windows of time are analyzed and stitched together as quickly as possible.

At the last plenary, after the concert, I started building something that extracted all the transients from a corpus in an effort to start with simple “transient replacement”. After some messing around and playing with settings, along with @a.harker’s suggestions, I got something that works and sounds ok, and have built a patch that does this ((not super tidy) code attached below).

code.zip (98.5 KB)

The first bit of funky business is that transient extraction (via fluid.buftransients~ at least) is not a time-based process, so even though I am slicking off the first 50ms chunk of audio before running the transient extraction on it, I’m getting back “a little window of clicks”, which represents the extracted transient. It can also end up zero-padded too, which I’ve then uzi-stripped out.

So I’ve done that, and then analyzed those little fragments for a load of stuff, which the intention of focussing on loudness, centroid, and flatness (my go-to descriptors for percussion stuff). Then I’ve run the whole thing through the corpus querying onset descriptor process which I showed in my last plenary talk.

(shit video with my laptop mic (cuz JACK is too confusing(!)))

////////////////////////////////////////////////////////////////////////////////////////////////////////////////

It sounds… ok. But it is a proof of concept so far.

I think the transients might be too “long” for now, especially if they are meant to be transients transients. This is going with @a.harker’s suggested settings of @blocksize 2048 @order 100 @clumplength 100. There’s also a bit of latency too, as I’m using a 512 sample analysis window (ca. 11ms), whereas I think in context, this would be much smaller to properly stagger and stitch the analysis.

So the first part of this is just sharing that proof of concept, while opening that up to suggestions for improvements and whatnot.

The next bit, however, is to ask what the next two “stages” should be. In my head, it makes sense for the first bit to be a super fast and as short of latency as possible “click”, and the final stage should be some kind of hybrid fluid.bufhpss~ cocktail tail where the desired sustain is put together from however many layers of pre-HPSS’d sustains as it takes.

Where I’m struggling to think is what the “middle” stage should be. And what scale of time I should be looking at, particularly if the first stage is of an unfixed and/or unknown duration. Should I just have a slightly larger analysis window and do another HPSS’d frankenstein, or should it be a vanilla AudioGuide-style stacking of samples so it’s more “realistic”?

Is there a technical term for this “middle bit”? (i.e. not the transient, and not the sustain)

////////////////////////////////////////////////////////////////////////////////////////////////////////////////

And finally, a technical question.

What would be a good way of analyzing and querying for something that will be potentially assembled from various layers an parts. As in, I want to have a database of analyzed fragments, probably as grains and slices, and each one broken up via HPSS and perhaps NMF, and then I want to be able to recreate a sample with as many layers of the available files as required (again, ala AudioGuide)? (p.s. I want it to happen in real-time…of course)

Will this be a matter of having some kind of ML-y querying, and/or is this possible with the current/available tools?

tremblap · March 24, 2020, 11:35am

This is an interesting implementation of ideas we definitely have bounce around a lot in the last few years, and it seems it comes together in a way that is quite idiosyncratic to your practice, so I’m happy!

Not that I’m aware of. Again, transient, attack, allure, many words that mean many things depending on to whom you ask. Schaeffer, Smalley, synth builders, plugin designer, all refer to the first elements of a sound object to create a taxonomy.

In my example I shared in the last plenary (APT) you saw I decided to treat 0-50, 50-150, 150-500 ms as my onset windows, because that made sense to me then. I decided to make them equal in value to describe the time series, which again made sense to me.

So for you, how to chose it could be done by analysing your replacement sound in 2 period of time (what you already replace as ‘transient’ and the middle bit) and weight them in your search for a replacement, and then replace just the 2nd bit that way. It is a vague subversion of the idea of a markov chain, because I use the ‘preceding element’ to chose the current. Does ti make sense?

Again, this depends on what you replace, how, etc. I am exploring various ways now, but usually the costly way of doing stuff in parallel. It is faster to prototype, and then when you find something that sounds good, you can optimise because you know what you use (for instance, just the perc part of first bit, just the pitch part of the middle, etc)

I hope this helps!

tremblap · March 25, 2020, 9:03am

btw this gave me an idea of an interface that would allow to make a sort of matrix to make such pre-processing… It might take some time but I’ll post it here - but imagine having the ability to define time slots to be considered in a target, and for each slot deciding the weight of various descriptors… watch this space!

rodrigo.constanzo · March 25, 2020, 10:11am

Those general ballparks work well, but I may just shrink things up a bit more and treat more as a sustain. Maybe something like 0-20, 20-100, 100+, although that may be too front loaded for sounds that open up slowly.

Yeah, that could work. I guess part of this idea is to lean into the ‘hybrid’ approach (was that the term you guys used, I don’t remember the distinction between hybrid and synthetic or whatever else was discussed at the last plenary), but basically having a sound that is made up of layers of decomposed actual sounds (e.g. HPSS, NMF, etc…).

And with latency in mind, I thought that I can get transients down to something very tiny, and analyze/query that super quickly while working on the next bit which is more perceivable.

It could just be that everything after that, in all the time windows, is a frankenstein/hybrid thing, with perhaps different weights being put on different aspects of the sounds (i.e. the “middle bit” having more weight put on the transient/energy-oriented descriptors/statistics, and the sustain having more weight put on the tone/pitch-oriented descriptors/statistics).

There’s also the “apples to apple-shaped oranges” thing in that I want to be able to query files that are significantly longer than analyzed audio, so some of the process will also be extrapolating out the initial analysis data with arbitrarily long samples.

Yes please!

There are definitely several aspects of this that I have no idea how to build, or would be incredibly clunky to build with my knowledge (and tools). The pre-processing is definitely a big part of that, in that chunking up files and analyzing (and potentially tagging, since for now the querying will be done manually, rather than via ML, so knowing what’s what would be useful) will be a big faff.

tremblap · March 25, 2020, 10:19am

I think that you should analyse it twice: once for super quick replacement (first pass) and still include it in the middle and end bit as a way to give you better match for the sustain part. You can always weigh it differently to the sustain (consider it less important in the match) but it is valid info maybe?

rodrigo.constanzo · March 25, 2020, 10:32am

It is, and I guess that could inform what gets replaced in the medium bit, but the idea would be to analyze the transient and replace that immediately.

There’s also some wiggle room (I would think) in the amount of overlap and fading available between fragments, particularly with the transient extraction which has no fixed length.

tremblap · March 25, 2020, 11:27am

that is what I meant. You replace transient with fast analysis of transient, AND you replace middle bit with analysis of transient and analysis of beginning of middle bit, and you can weigh them separately. the same apply with the sustain, which could be from the weighted analysis of middle and beginning of tail.

rodrigo.constanzo · March 25, 2020, 11:46am

Aaah right. I understand. We wouldn’t hear the transient of the middle bit, but it would instead be part of the querying for a suitable middle bit.

tremblap · March 25, 2020, 12:12pm

indeed. and you can weigh this ‘influence’ as well. and do the same for the 3rd bit with the 2 first bits of ‘influence’

rodrigo.constanzo · May 8, 2020, 9:13pm

So I’ve been playing with the transient stuff again and wanted to come up with a slicing that works for me, with the intention of doing crazy fast stitching.

With the idea of replacing stuff in real-time, having front-loaded short bits makes sense since I can stitch as I go, rather than having to wait 50ms for the first chunk of audio to play. I’m also going with the names of ADSR for now, which lines up well since I’ve got four main slices I’m testing with:

attack: 88 samps (1.995465ms)
decay: 88-512 (11.61ms)
sustain: 512-2205 (50ms)
release: 2205-6615 (150ms)

For the numbers I based things on the size I’ve been getting from fluid.buftransients~ (ca. 88 samples) and what a reasonable amount of latency is for real-time use (512 samples). But it’s fairly arbitrary as it’s numerically/computationally based, rather than on what would perceptually stitch together well.

So I built a patch that would stitch random samples together at those breakpoints. Cool, this kind of works. There’s a lot of variations you can do with the first three chunks where it still sounds believable (more on this below). The final bit… not so much. Granted, I’m literally making random assemblies, and in context, these should hopefully follow each other in some kind of markov-y way.

I then made it so you can tweak the breakpoints easily and hear the results. Here’s the commented patch:


----------begin_max5_patcher----------
6027.3oc6c81jiZil+0y7ofy6cUs2c83Q+EIk6M2lM2kT0MS181Iak5pjTcg
so6lLXvEfmY5cqc9reBILFrAagsfloCcpxgALnmme54eR5Gx+8W9hYKh+je5
Lmux4mbdwK96u7EuPcp7S7hh+8Kls16SKC8RUesYQ9eLdwuN6F8kx7+Tl5zg
wdqV6ml535Bo6tZz10wayB8yT2Jr3rAqT2g7o7J3tuo9qk83FesrLalyuTbo
GBVsxOp5CXiW1xGBht+1D+kY5uu.PlSuwAAXyA2H+W4ehPyAkOEonDDUJI4m
6e7xWl+wMFp2KiWu1OJ6HEmN2IdSVPbjWX3iNKevK5dem3O3mH+2Ng9Q2m8v
t6ILHxeY71H0MhZDi.Em8t3nr67V52FtIPyp7MSC9apuIJWuaGibEJLhSTnC
Pj++HtU.o7mUj2Z0yZ124G9A+rfkdy5aHjLeGtsHw268ahCjpTe.ZvtCZXHN
GzXHEZAQ7wBngm6rIz6Qm6R7hduejSp25Mg98ApA5Np4BnUbGgT5XA0Pycj3
0p30R0vI0+97uTeXpw4cGznt4XFT4cBwiFuSvbm7f6KjNnytNLg0cLAB.JaG
JWgMHnJz0.hJ44z7t2uw7csCIMp9tFjrqov1DzdDfxOclMzEnhsjQO6gfTk3
3mzrRhZRIIsqj2XlhJTYobYmIEdwIS89vtF32884MQMw1Y1+Q9G+t+27O8iV
oOiyuXQ2C3h4N26mIiaj4j8fWlS5CAY8QDEb28dX5zUbVEmmwPHEnNjhimyc
wgq7SbhuqH+UuDK9BpUhBTkPVDLlN3AiaKrir9nUZLqCAd.WVfGBtLZqzeDO
XAdRjhouity23.OL9kG3oPQkQnUZpv9ZZacm5.GsomM0YxHWVmoqKYevUc8X
CiJl6uEDk6cmJUHmp+y+YXGzafsFpHzUklAC0UzyHCmgsypsId4iUzb81kaY
8FwTYVomwJGXwv8+ohQCuS6cBhthv80PmxRJab7t.0fPHJadHP2sis4jBbhI
CoaEG5RasaV8f5XWMW06xHvhJGs9rgzl2NXNV8WGzbh0Lv0U6vD5JkGvv3xZ
EX6CLafNissNqKWlzCw0kOfEcYD.tn8tjIxBjx7St0OxaQneUm3FSRW2R+Dy
xmN+E83Y4SpqAq2ttZK0W93omIXdSgqn7YmW4D5pPf12osEMYiShenuWpu4F
vzKrJDjP0yAEf4tfJ+gvFDlp5iUZQp0yBEUVPdPnuLOSZdGx9u8Kl4sYSkS+
hJ2RN57qwpGD+lxSEDoOEr7TI9eHX28iKOqWhT4xjZ11D8f.9TYjr7GSrrD8
nsAkcUp9oBQR0ijO3A4.U0CaQ0ws6x6gqWg0NztJKcWh5+Q3f8PjrK+9v3ku
2eUEaNYuxF+nfnMI9ox7wZizpWdk+cdaCyts53gfn4Md8cirpwKVNBn+PRfW
XoFbeRvp3nbgnVWQ9o20bxPWZSAZUkQ8Mh71zvMKs6j3RKWLUpjaSW3kj2SU
DpAs6hYwwg0uT48E5eWVwk2DDEc.JlEuo8KlDb+Cm3dWDKu35S8rUWI81sQ5
qdqznH61cSmw9umrvoB+15O9O4IC04kIG145hwyVdQc31GRWlDGFVSe0W4CM
bkURi7k9eLXU1CpFppwf7qGrYmQzrxd4UA26mlU+bYd2mV+LoYOpA8JmZ6hB
m3ay7kkAJ0h5egZKJUUO1pQ3pc9Sln9fUaf.kZG.g.zpWuoJPaOZ3QQDwUeV
sGTr04a.qRfi0KyfNyFtpqQy4xewtj.EAXtP3pVhg5n0+zqTqn27YcFQPmDQ
tSVib1YgEhtBVDQurKGURW+BKmxJRZ9PlKDLlvlVQvq0JRODOW5SgUzofKYP
jLYP5jr6xqIz4PmOiPG10ANHjZbfjhY1CQGOFSxFUgKo4F7fNCL7qCXvb03j
wL2mDf4DAexbV3DzNbfZFNnmDNziK2YVPz4C+fgJjAB0klpW97SBLvgAXNZl
jL2Xw8JMVzPBG17jH0xDIMVMUHVyTgpoSADBd9XpftxrQZHggGYlJINYODjr
5BpnQbc.BAo3Rhdl3wiG.4e6RKwCCrRMd501moIIBazj84e2A1cHAd5rOlDJ
AwzjmgYbnjABP13s78N6qUI3BPGjUx9fQp5bg.7nwXQgMGTkK3B.nqbbjPVM
.BQFW.zUBNjqMgjdL1EFOvQUct5+6D9TzVJgwrZXZ6+aR.IHUCU.kskqA31.
UaS5lDo7emSZ71jk9Nq8y7BSm+uz8T6jqLyNRmCCg0iWf.FVHR8Tufz27qNW
UgogVgwkedhRZpbgcqWb049SIiQq7+jswHs50NHAZCjpYZz3hsJUXgPvnB8e
rb7fXJf.uV.Y+rmmO4SsLsoJ4N+5MiTZGnh98hQI5TWnyysEDUtVB+T4Xsy+
dF0O0UYfZnLjWTQuJDPSEBXOID4gmLCJvhdDKxS03PLPLH8tTLR.CirLfj9V
LPFIF39VLvFIF8o2p4lFfdVJfFKE8UPCjgXQeZcpJwvLoneECPGykTstnjUZ
1F.epkMHscgC8jKbGYzWJaf9Q1PhQg2NDYrXL7Pjw8dn1ksdxtGZb8cPxXV3
F9dU33nBnxJJNqbvFdHB2UHZHMtL0sjO73FpuvsBivcrfS8has5VMqtt0KKK
IXw1L8vWqRquNw9n6CiW3EVvsnRto0.4jd4dgyRT.Vf.cf9jzKi9jXjd9knf
4npzmDxZjMrWM2ek8HYc4kUf5dQb+8r79sf1ntr4jppMvMmZPZBBM.7jU+hC
6fAhNP0aJq0t5pyu1oHKKPLmWq2FlSx8A6sVYiS51zLufNXGPfWH8fA71oGL
ahdvGPOX8ZpC0y0HWymVJYhcvSrCdhcvVhnhz5KH6fvSQNTwSQclsQJOEgtB
T2Wksqj9LP06wNlKddvSQn83oHQgLB9D2ypiHtOOnon8XzJ0kpYzJXhQq0gj
mOzTDCsBYFfZhxiGSHRp+x3nmBlapoHuPkRlRmXto1yQhA4FJB7Hi4l4QYuq
64iQlkQ1PZspqVAgcGYQZekr78Kf6cbaP02hsENNpX+zgNQ02Ip9NQ02Ip99
aFp9dwYrAtVH.LF4NGUrBBzQWJ6IJPOQA5IJPOQA5IJP2YJP2qLbhNJ3y3DI
nmHA8DInsHweg+Vf3uPSACf6Hg.xv14UEXLy.Y3ybV9ZrkDj26xAzX4.1aj+
j2UtEOQi3IZDOQi3m.t5RdFQiXKHb.2wQj7wQkIllMQIEMGE+YN0tg386auF
v8U7Ex8ULdTysaBsm31slyueIxsaxUvs6BlN+jys6U9K8drCp7E9aIBr3WRk
FY1MbhY2mhY2S66ySL6dhY21lY2b9fSra81xKApWTZLbXY6gga.sHxPyq6c6
+r5eCKFbbYLyqaphQPb8JqNwq6RDgBa9m3ke6xqa.64lkx0RqaMhL5rTtbRL
e075VS1P8uyJiHRLm3bWPRZ1S.qtoUbZvSr5t.UX5etPcgiPJhMwp6IVcOwp
6IVcOwp6IVcOwp6RVcC0r5F6Nwp6IVcOwpa7U9KhzDktmnz8Dktmnz8Dktmn
z8Dktmnz8Dktmnz8Dktmnz8Dktmnz8Dktmnz8Dkt+BgR2sR6UoJ3s78ly6UL
4BY0Mch1qcl1qtnIZuNQ60IZuZEpcx4ceYonWIuZDEq.kxAlBFOafwW856ds
T.VSJMLdLu2NyYcFVtx0aAi45kTf97XmclXOF.yoyYZNpMs0NWGRnvQ2F16k
xiwq08ApV2ethLMOG3wH.XEVQPzqYMejs6zd0rvhKrBIrPXwySRXw4VgCV6v
mmUbvhyrAErJMcfOOnZDt2oZDsnhO8K67yPlFItRCKnl3vPWMP8kBSitxhYz
lEOmIZD9bDMRp7X..P3EDMhPtQwnwunYZTt2fgyVdeReAtXr9KMIWzU.Z3VN
ANert5UFKYOEn13vjOOWtQj4ZmKZ+IElvaHtnukBSHQEm22RgIKHEu26QFAl
mFuDgfdVJfFKEvmXVs0uzIyXo.2qb.hOJLLJe5lE3ZXodiwxFsa4keduX2o0
9cDq4x1qOcjkaqUODrZUskdr40zHuHclP8hhPN2hZaS8p1KRuIJl6knX5skj
y9yOrMUrpyirI5E8RzKrRiv8fdsaXXGpX+wDeuL+UNjWkE3m33E4E9XZP5OG
8yQZ9X7UNbtSp25MoN+d3bgfRboqS+W+4H0tTV9UeEEhjWCN2EptPwOM0eki
77uBg.TmeOEntRhenuWp+WkuLjzWkOQzx6ScscxUdnmkwa0iWz0XzFxlch84
Lbwt6CjMmc313Gjo.aNylfca6ZgxNYYkL.rwbbQXxOZ2FYfUrYuUXgQ6gsyt
0aCyBRCCxCpW7L+f29vpUt7Og9klQ.ztf89YRgQ9z0xtx2.VQVqN2om7m2dX
qn2MUQPYHqrUdYd0tkjfFHtRi9tp3qrhEJ0UujOUIURybHRmQqLU1IRp8Auv
s9w2cL2g1qxqK.KXIoIpd0fnfrbxwTfl37+PBN0EKnrJiEoxsj9PbR1o67p8
82gsM07gwQ22gGUgzVsa7kUKO3x1HQM25DZQqS5Hw5zk69kh0IfwvBAmH3Bn
qKtiVmP6YcBGiVm1yzjLVLMK1i49RvzTk8FHDHYwCczxzVlkCmMYKEwvJ+Ul
+70uvgW3VSKVLna7xstMSCE.yU1Kbe3kLNz0kxZySBi+nw5KieM7uFoK7lhe
x2FsYj9YWztnj6gZWztqZMtm16voegt0gSgW7NG9tcR6gZmC+ixLf2Emr9yM
qbtGqbjRkaw16tqxrc8viKRBV0plqo00Mm7fvfzr5kEzZtdLWTMWOhglSaFV
n8uGP4d2Xu3AT7431lub2Z7B1s7oCpIeaYsTDux72YH2KKkE0UqRPgdytx95
m+eak2xOa97KRlcVoECTi4iP28Y++SWPn2ie1odHklGnRsoJs09jz.YIvgkz
5xjdHpZJuYjAalg0wS2o04kwYrlyOeXWCC1fgEb+iNX58cgaCVMWp8KiWuIN
0+yN+mUV3mSZDfaH+zrSFtsZ5FSh8hkYap8t8wycdcsO3zVXoDeuU2EGJGul
yOO6sdKkQSiSev469lu50+0T+jzWmDuJI393W+tGiV9524sdSne5qeiWx89N
uIXQhmbPzou9s4L+y4u3mFG4Esz+0+7rFmmbrwg6uvveDrdJNQ5W0A8dJpXH
BnDG93NGLMMHMOrR6o1LxwpPmgDMi0ALydyPMbcJ8Wun3sw63AnqZf5DeSCg
Gt3rE.4oekmaaIqcqsF6UWDgF69LssElz3bRuz1bWSTbDqeZbpQMta+z3DiZ
bZ+z3FYuQpPQhKuo.lzTXhEZJF2DCYazPlz2Qg1nkvFzRHgEZIpQQApsoHb.
cP52ltzZ7v1957Dvlz1GzeuNX0lXY4AEw5wpzMtn5qTM.m+dJfa9RGqEnAPK
vMCf39uogT2KCAw.hoHHX.TCB8xTCjd+t2D035Lmqumf0lWLnO7hMpoKCzaY
01nxHrQXRnQZIyFYYflUbDmXk1BLXHnQooo0yzbLgYrTjSyjFjvToAO.RCjx
LUb.Cg3PLVbfWo3PFNiTf40RZ4hygtFg5GHg1q0MYTQGLlw9y8zHoA5RLUb.
Ch3fMUbt19JSbH3t8jghIiMgS6oFGYTfRqjG1r4nvFsDkYjVQ6i52Lsw6kB3
Nb.BmJw20ixztnnWYaYhGpUZIhQce0eKxrlsigMNrWrcHTiqR45QYbWTzqrs
fClsCXnZI7f4OXTBh9IXlwM8YbD0KqvAaEk4s5AaAkGr8Sd7VOY6a6jGtkSp
d6sza+dGrTF620F2tJH9cp8NwaeqezV8h1raGdrN1r396BBCWFGFezNu4tU2
Yl9pk6xk69t4r2DIHPH+lbBvCwLjq5H4ATZ02LM88.2cSDJQ.P4eUhKhQnpi
3HLEmeD3faCsus.PgtE.BNfnORdJXMB4Jwxn6K1qHYU1vC2jDuINojBsywhx
u+1r36S7VETrmJTe8itovzJQd0C20C0jHc28T0Trry3MaW5cpN.ukKkO3ZfJ
FIAHk1QnRMUcDSetpDO1Oze8g2Ji65xT.JG.4b8QxmBBU+dqcSDFEhUeUj.y
E5inPf6AMXprEWlCf0aRj7KpDRnKCo6B4..jT+lyW+Q08caPTt2heY2JVfft
51bmcAhwI35cqFYoV1K1hwpgFP0MW4kFobFfx45iNR.Ozb0cWema9eL8Q5mT
uXsV.SU2+VkQx9Sa7ibdmWTpy67WGrHNb0rJHZM3Q.HPlxtSZCPXLssCRnsm
nt6h2ekNDu0KK6TNDU2cZ+oJAfqtq1JOuJPgUDn+7eH6V0sA5nepY8v0sAEz
7PlZ.kv.JqwbiQ7A9ZGnsUWibEohq7PUqot7IK6xnZwQZmxvF59ot2ce7kr+
lw2V+5vo2xja2GSq9McDzZVz+3O7e2QSYJf4p5e.tTLTYJCgHHxfTNDWHDny
gKPtDU+MU.kP+IbCf4UKniqja6qB5y1U2P6tA7JWprXu51b40gnRBIEepqNM
Zt4fpIDO8V2lkb+Iu1mlMt+tGWkDeuezOnfnVrwgtXYnMUW.i3JDEGAD1zDW
QyofkczLmTDosoiZu7HCuo51phqJ3ayF2WUr5FpheeHWrfoCXhEEomxOEEaG
ixZJAurduiO5hLkOoE625G4+AuVsUOmhaSa0utBU5a1h83j57RHFTbj0Equo
bK9eDITeqWPzmMTpj1.k4PN9H6KZoeLP5b1oHOvBG18ezKxDZTIT+OUe4JaV
lZrdhtGitIM4jiY0z13fnEGG+0l30a7W0UrxssDBG6z1ahsb7bIAeZYVRGih
zeBT9aczm6KytCLIFzPOee7J+zAvex3jzMUyiUU3yGBY.cP+9sqWbZgoonMG
mzrGDs+bvxrdJbqUEy7e5hxFcUa7tkwa7Mubi8yZsPssuzXA4Ocf7O3snqFA
l5t+DpTxgPs3rEP2vP9.llhrGqD5GhuOenLcrSAyjCiFpSnPvP0vqH0Vgjmv
pf+qqOyBKMfv6OV+U21FCX3hqdDj+xWaEkakWx6eUT9OGYuRMa.czt2zIZtb
tI537n1xTTPJievJWLmxk0gqZeKLGbvy3iX8Isv5y+lMWhwesX61D.Jd66sw
iBct0noZYezVWbhSNgmfyZNseIVN5.6M6WlIgiaCoVVLhEg9xTl5E.7qiie+
LKEb5W2DD89ttbYRsDvKSGicAxnJmIXjA2SiiK972Vm+9GYXueMLTecb0iNe
RDCZx1bIto1mVXdfMWlFv3uVdkLNbheOWVNq3l7d8Xx9wbp87TWrTXdoDmuD
5FcmJyfSKiKtqm+TyMBPPJXShPf4rhiXbqN.g0dexs7mYWiCFwqOXGVgs30t
j4s2QZuBCyU3J+L9Xx.VQXJRTOuVt4Oz18C015FZVvFRhHTgRA6HnwwGchXU
LdNqztowiFvT7WCO0p6cXVuX0eCjLcR2pvoOqOE.5MxhuNbq+qfckJcPVwZQ
uOFlnZjKaJfIweLpyR3YisZQI7O9nW2EP.Je+IWY5yvXj12AAkU+XeA7aS78
u.ITgdkSEGzMO4t8Et2JirEk40YwiS.tt5IIzEHENET5JfBp8kw+h+pNKerx
J2fki+ATjwv1x2+meXX7G6NDhnZRIw3PtpbS.UZUROGm4nB.f2GJ.5KQEPs7
Eeab3EXhTNERTYESTM4drNs01rMYS3kU7.qjMcDHU5+qBRkGFfdlJTI.cgQH
JvUnbBPbpKhaS8JweoevGN8J20jMS4.S1qSfp7K0NBmdHJ4lEFmw2kHCnIJR
DvJXJrzZVfOIZOLVQo9QqRGaYP1Cx122qQZFCXP.SuDUDWfnfK2LnUscx71b
tYIrxjQ8FurXm2jOJzYVq8Sy7uaaXX1YWfk5nCAJqKTUUXYs54S8wgTV8PhM
xoTndJscwLr9cVvU5R5ZWPUqTYWzhFY5P0sAdf3LttvFboqCFIvBqF8LyOcb
89B7w7Yy4Qymdbr6YmLPPN9qFdIt.1k98EKfiQSIgguJBMOihPg.nKKUZKSJ
VqcW..CqyY5C7m+lfjrGc9utO9hbm0u5bpUvK+YltonKTsoF9x+wK++w1xp.
F
-----------end_max5_patcher-----------

One thing that strikes me immediately in doing this is that fades are super needed. Yes, these fades.

I know there’s been some talk of this several times, but for use cases like this it’s pretty impossible without some kind of fading/smoothing, particularly as the segments get longer and further into the sample. Surprisingly the first few bits can be smash cut, but sustains are too jarring.

////////////////////////////////////////////////////////////////////////////////////////////////////////

So I wanted to share this franken-sample patch as it’s quite handy for testing. My game plan is to play with this with a full tank of gas tomorrow and find some breakpoints that work musically, then doing some batch analysis with these breakpoints in mind, and try to do some brutalist smashing of these together.

Doing that in a sample accurate manner is going to be important, so I will likely have to end up in the land of the fl.lib~ for now. I don’t know if there are any playback objects in the pipeline for TB2 as I would think being able to playback bits/layers/chunks/slices/pieces would be central to the querying/matching side of things. It would definitely be useful to be able to do it “all” inside the FluCoMa-verse.

weefuzzy · May 10, 2020, 8:10am

Fades aren’t going to be added to bufcompose in time to make you happy. I would suggest that

for real-time, using framelib to assemble the audio itself
for offline, using jitter for a much greater range of things you can do to a buffer (although I think the plan is that framelib is to become buffer-capable…)

rodrigo.constanzo · May 10, 2020, 2:45pm

Ok, taking a look at the framlib stuff for the stitching (and fades too I guess, for real-time stuff).

The offline one was just a mock up really, to see where these breakpoints would make the most sense. Being able to audition with fades is handy, but probably not worth doing a whole workaround thing for now (I came up during the “jitter is a paid extra” era, so I never really internalized doing “normal” shit with jitter).

The fade amounts will be critical too I think, especially to avoid weird artefacts in the first couple of chunks, as they are tiny. Is there a rule-of-thumb minimum fade time to avoid AM-y shit? At the moment my shortest segment is 2ms, so not a lot of wiggle room there in terms of fading.

It’s a bit of a brain fuck in terms of figuring out how the real-time analysis would relate to the stitched playback.

Even drawing it out I’m not sure I really understand how it would line up temporally.

So at the top is each slice (labelled ADSR), and the consecutive overlapping analysis windows. So analysis window 1, analysis window 2 (made up of segments A and D), analysis window 3 (made up of A, D, and S) etc…

That seems alright, and I guess sensible too, in that the second analysis window would be A+D, rather than just being D.

So playback wise (the lower three bits), nothing can happen until the analysis window of A has happened. So once that has happened, I would play back A, and then carry on playing for a length of time equal to D while fading it out. This particular D would not have been analyzed, and would just be tagging along with the analyzed A.

Once analysis window 2 has happened, I would then play back the second half of D, while fading in, then play all of S and a bit of R while fading out. So again here, the only part that was actually analyzed was the fade in from D. The S and R are along for the ride.

And similarly, when analysis window 3 is done, I would play back the ending of S, then carry on playback from there.

This makes sense in terms of latency and stitching together something that is equal length to what was analyzed. BUT in drawing it all out (it took me like 5 drawings to arrive to this version!) I’m struck by the fact that even though I’m analyzing large chunks, most of the playback in this model is of audio that has not been analyzed. It is primarily what follows what has been analyzed.

My intuition tells me that this would probably sound ok, in that it would produce a kind of markov-ian thing where the bit that follows the analyzed bit has a high likelihood of having done so, and if each segment is analyzed in sequence, they would probably sound alright following each other (?!).

///////////////////////////////////////////////////////////////////////////////////////////////////////////////

One perk to this, theorized, approach would be that the effective latency would be down to the length of the shortest analysis window. In this case, 88 samples. That’s fucking ridiculous.

I know from all the other querying/matching, that 512samples is an ok amount of time to wait. So perhaps I can just “zoom out” this whole thing where the smallest analysis window is 512 samples, and things zoom out from there. I would still have the same kind of problem with hops and such.

OR

Another approach would be to still stitch together from these tiny ass sounds (so the first playback segment would be 88 samples long), but the whole process is just postponed so it doesn’t start until after 512 samples have passed. So after that point there would be more real-world time that has been analyzed and stitched.

Ugh, this kind of shit is a real brain fuck!

Is there some obvious hop math I’m overlooking here in terms of best practice?

tremblap · May 10, 2020, 3:34pm

Man, I loaded my modular synth corpus (750 small segments) and played with this and it is really fun! It suffers the absence of cross-fade as they are glitch sounds and it is really fun.

Soon, the code will be on GitHub so PullRequests will be considered, should you want to code it in C++ Or maybe @pasquetje and @jamesbradbury some other CCL wizkid will do it, who knows For the foreseeable future, our little team of 3 lovely people works on the second toolbox and its documentation…

What I don’t understand though, is why you want to assemble all of this in a single buffer for real-time use (for now). Why don’t you have a 4 voice player, one for each component? No need for framelib there, just have each analysis point find its best match and play along. You can even use the length / latency of the analysis to actually stitch stuff together!

I can make a drawing if what I just said is not clear, especially since some of these larger-scale multi-resolution analysis ideas come in part from my brains in that other post. Maybe I can try to be clearer…

rodrigo.constanzo · May 10, 2020, 4:38pm

Yeah quite fun to play with!

As an experiment I tried checking the composited buffer for clarity and then iterating until it was above some threshold, but the correlation between sounding interesting/believable and clarity wasn’t strong enough to bother putting it in for the sharing.

This is mainly to try to figure out where the breakpoints between samples should be, before figuring out the playback thing. (just had a quick chat with @jamesbradbury, which got me on the path to doing the sequential playback with fades in fl.land~. A bit clunky since fades may be asymmetrical given the lengths of what comes before/after each segment, but seems solve-able.

Yeah that would be nice. I can make sense of what each segment of playback should be, but how that relates to analysis windows, and how those analysis windows relate to “real time” is a brain fuck…

tremblap · May 11, 2020, 8:57am

ok I thought about it overnight, and I’ll try to make it as clear as I can, and feel free to ask for clarification. i have not implemented that yet in real-time, but it is on my todo list as you know for some time now… since the grant app, actually

This is a picture of your metal hit from the other thread. I’ve divided time in 3 windows instead of 4, the principle is the same. For now, we’ll consider the start to be perfectly caught so time 0 is 0 in the timeline. I will also consider that the best match answer is immediate for now, because we just need to understand the 3 parallel queries going on. Here is what happens in my model:

at time 0 (all numbers in samples), an attack is detected, so the snapshot of the address where we are in my circular buffer is identified. Let’s call it 0 for now as agreed. I send this number in 3 delays~: 400, 1000, and 2200. These numbers are arbitrary and to be explored depending on the LPT ideas you’ve seen in the last plenary, but in effect, they are how you schematise time, not far from ADSR for an envelope. Time groupings. Way too short for me but you want percussive stuff with low latency, so let’s do that. Let’s call them A-B-C.
at time 400 (which is the end of my first slot) I will send my matching algorithm the query of 400 samples long from 0 in database A, and will play the result right away from its beginning, aka the beginning of the matching sound.
at time 1000, I will send my matching algo the query of 1000 from 0 in database B. When I get the query back, I will play the nearest match from 400 in until its end (1000) so I will play the last 600 samples only. Why? Because I can use the first 400 to bias the search, like a Markov chain, but I won’t play it. Actually, this is where it is fun, is that I would try both settings: search a match for 0-1000 in database B1 and search from 400-1000 in database B2. They will very likely give me different results but which one will be more interesting is depending on the sounds themselves.
at time 2200, I will send my query to match either from 0 (C1) from 400 (C2) or from 1000 (C3) again depending on how much I want to weigh the past in my query. That requires a few more databases. Again, I would play from within the sound, where I actually care about my query.

Now, this is potentially fun, but it has a problem:
there will be no sound out between 800 and 1000! If I start to play a 400 long sound at 400, I’ll be done at 800, by which point I won’t be ready for my 2nd analysis at 1000. The same applies between 1600 and 2200. That is ugly.

So what needs to happen is that you need to make sure that your second window is happening during the playback of the first, and the 3rd during the playback of the 2nd. There are 2 solutions to this: you can either play each window for longer, or you can make sure your window settings are overlapping. I would go for the latter, but again there are 2 sub-solutions: that will change how you think your bundling (changing the values to 700/1400/2100 for instance) or with a bit more thinking, you delay the playback of each step to it matches. With the values of 400/1000/2200 you would need to start your first sound at 1200, so it would play

from A- start at 1200 playing 0-400 (to 1600)
from B- start at 1600 playing 400-1000 (to 2200)
from C- start at 2200 playing from 1000 up

so that would need adding 2 cues/delays, one at time 1200 and one at time 1600.

I hope this helps? Obviously, all of this is only problematic in real-time. Sadly, we can’t see in the future. More importantly, the system would need to consider the query time as well before starting to play, since the returning of the best match would never be instant and dependant on database size…

I hope this helps a bit? It at least might help understand why your problem is hard…

rodrigo.constanzo · May 11, 2020, 11:19am

Awesome, thanks for the verbose response here.

So the “missing time” thing is what I arrived at in my initial idea and sketches. It’s also tricky to think about as there’s the absolute time, and the relative time (relative to the start of actual playback).

The times themselves are obviously subject to massaging (hence my test patch above, to see what kind of segmentation works out), and my initial choices were heavily biased towards the front of the file (to the point that pitch is useless for the first 2-3 analysis windows). It may be overdone though, but I was banking on having the segmentation start with an extracted transient (hence this stuff) with the subsequent bit perhaps having the transient removed, so they could potentially not even require fades.

I think the overlapping analysis windows is where I was leaning towards, but the math of it was hard for me to conceptualize.

I think, in spirit, I like your last example/suggestion, with the caveat that 1200 (in this case) is significantly too long to wait to start playback, as we’re pushing 20ms+ at that point and you can definitely “feel” that, particularly if the sound feeding the system is very short.

So I guess that, mathematically, the delay between the 2nd and 3rd sample (or the final two if more than three are used) has to be equal or smaller than the time between the initial attack and the start of the first sample.

So if I wanted no more than 512 samples between the attack detection and the initial playback, it would have to be something like: 88/256/768. Is that right?

The potential in jitter with regards querying time will definitely factor in, especially if the database is multiple times bigger due to containing multiple version of same sample (HPSS, NMF, transient, etc…). In terms of temporal slices, the query can just be limited to the relevant temporal slices, to avoid another dimension(s) of querying.

My, perhaps naive, view of that is that the overlaps can be extended forward a bit so that potential drop in energy mid crossfade doesn’t become apparent. Either way these sounds will be synthetic, though it would be interesting to see how well it handled stitching back the same sounds again.

////////////////////////////////////////////////////////////////////////////////////////////////////////

AAAND

In reality/context, I will probably use these staggered analysis windows to query and play back longer sounds than the initial analysis window (i.e. the first 88 samples would be used to query back the first 256 that will be played, then next window (88-256) would determine what played from 256-1024 etc…).

It would be handy/musical to query and stitch together short samples, but most of the stuff I will be playing back will be longer than my analysis windows, so it’d be about mapping those two spaces onto each other in as useful/musical a way as possible. My working theory is to take the time series/stats of the short attack and kind of extrapolate that out to a certain extent. There will obviously be a very steep point of diminishing returns with that thinking though.

Actually curious if you (@tremblap) have any thoughts on that aspect of the idea, in terms of mapping short analysis windows onto long playback.

tremblap · May 11, 2020, 12:08pm

It really depends on what you want to bundle together. 512 samples 3 times in a row would do that:
0: start recording
512: playback of A from 0 to 512
1024: playback of B from 512 to 1024 (having analysed 0-1024(b1) or 512-1024(b2)
1536: playback of C from 1025 to 1536

(if you want to give the computer a bit of time to find it all, you’ll need indeed to make it shorter)
I find that writing time like this helps me visualise what is happening at each point. So for instance, with 512 of latency max, and let’s say 128 for retrieving (2 block sizes, this is probably way too much) that looks like this:
0: start recording
512: start playback
now I need to subtract my query - god I hate these exponents of 2 so let’s start again with simple numbers, 500 latency max, 100 safetly query duration

0: start rec
500: play
so
400: latest query
so
500 will play until 900 with confidence

I do the same cycle again:
900: start B playback
so 800 is latest query. we already covered 0-400 so that would be for 400-800
so playback is 900-1300

1300 start playing C
so 1200 is latest query, to play 800-1200

in effect, your various window size have a cascading effect on your overall latency. Put your numbers in there, and see how they behave…

rodrigo.constanzo · May 11, 2020, 12:19pm

rodrigo.constanzo:

In reality/context, I will probably use these staggered analysis windows to query and play back longer sounds than the initial analysis window (i.e. the first 88 samples would be used to query back the first 256 that will be played, then next window (88-256) would determine what played from 256-1024 etc…).

It would be handy/musical to query and stitch together short samples, but most of the stuff I will be playing back will be longer than my analysis windows, so it’d be about mapping those two spaces onto each other in as useful/musical a way as possible. My working theory is to take the time series/stats of the short attack and kind of extrapolate that out to a certain extent. There will obviously be a very steep point of diminishing returns with that thinking though.

Actually curious if you (@tremblap) have any thoughts on that aspect of the idea, in terms of mapping short analysis windows onto long playback.

You’ve mentioned having multiple concurrent analysis windows which influence what samples get chosen (ala an envelope follower buildup a signal which then selects between shorter and longer samples, for example).

This is definitely going to be in the mix, but I think that it would fail a bit in fairly simple circumstances. So say I’m mapping a slowish envelope follower on loudness to the inverse of duration where the more busy I’m playing, the shorter samples I’m playing back, to avoid cluttering things. And if I play slowly, I get longer samples. That makes musical sense, but it leaves out potentially powerful moments where some fast playing stops immediately, where it would be great to have a longer sample playback, but the envelope follower could still be lagging down (relative to the overall latency anyways).

rodrigo.constanzo · May 11, 2020, 12:24pm

Ok, I guess the relationship is whatever the initial latency is can’t be greater than the distance between subsequent steps (although it can be smaller).

And for this I may try to fudge it more where each section plays back longer than the initial block, so some previous file is fading out for 100 samples or whatever regardless.

It is a tricky thing to think about.

Another thing is trying to figure out what kind of information will be analyzed in each chunk. 512 is long enough for shitty pitch but stuff shorter than that, not so much. And if the first bit is only a(n extracted) transient, then the descriptors used to analyze it can be skewed towards that. Same goes for the longer analysis window, which can rely more heavily on pitch, rather than timbre and even loudness.

What I’ll do, once I figure out some reasonable breakpoints, is do a macro analysis that analyzes each segment for everything, and parse out in the querying what is actually useful to have.

tremblap · May 11, 2020, 6:19pm

it is, and this is why I go verbose with numbers: better than a rule I forget, is a reasoning I remember

Yes. This is what you could get for instance from analysing from time 0 all the time (preceding context) but this is only a hunch.

What I did in Sandbox #2 is to keep these mapping of immediate/short/long trends as influencers that I could assign as preset. So there was a contrarious preset, and a subservient preset, where both cases were explored and recallable on the spot, quickly, so I could meta-play my ‘cyber-partner’ if you know what I mean…