Hey there! I am working on something similar now, but I cannot really confirm yet if it works. My dataset consists mostly of short sounds (short scratching gestures on various objects), and here is my list of descriptors that I am trying to use now:
length
loudness stats
attack loudness (first 100 ms)
attack strength (attack loudness / mean total loudness)
spectral shape stats
grain density (length / graincount)
spectral grain density (length / spectral graincount)
transient density (length / transientcount)
tonal strength (mean loudness of harm / mean loudness of perc - hpss)
The idea with the āgrain densityā stuff is to use the ampslice, noveltyslice and transientslice to get an idea of the grittiness, or granularity of the sound (and maybe some vague spectral morphology). It might be BS, I have to test and see. When I have the feature set I UMAP it down to 3 dimensions, at the moment it looks like this. There will be some spatial granular synthesis involved, thatās why I wanted the 3D.
I could never get a good intuition on using MFCCs so far, so I am kind of avoiding itā¦ I tried earlier to use them as general descriptors, but it always turned out that I could have had the same result (at least on my dataset) with a lot less and more targeted descriptors (like loudness, centroid, or flatness). But then again, maybe a high-res MFCC + UMAP combo would make a lot of sense for a āgeneral purposeā application.
Thatās a cool idea. I was using time centroid before to give me the overall ālongnessā of a sample with good effect, but this is another interesting way to get at a similar idea.
Hmm, this is interesting. Do you mean you have like a really sensitive transient detector and are using that like a zero-crossing thing?
It would be cool to have something like that natively, though I guess the idea is to have spectral flatness and/or pitch confidence act in similar way.
The video is great. Would love to hear what the sounds are if you have anything like that in context (or even just browsing the 3D projection).
Almost, but maybe on a slightly more āmacroā level. I have some sounds in the dataset which are more granular than others, (like a slowly rotating rattle) and I wanted to distinguish between these and the āsmoothā ones. There are also some objects that produced some strong āclickyā transients 2-4 times in them, thatās what I want to listen for with the transient slicer. And when it comes to longer gestures, there are ones that are just one continuous ānoteā, and ones that have several slightly different such ānotesā in a similar time period, but not necessarily with strong attacks. These are the ones I want to catch with the novelty slicer, though this is the part I am most unsure about at the moment.
Thanks! It is really just for visualization, and I am not even sure it clustered the sounds meaningfully, it is a work in progressā¦ Iāll make some examples with sounds as soon as there are some results!
Yes, it is quite similar to the help file, numneighbours=5, mindist=0.2, learnrate=0.2, iterations=50. I actually swapped the standardization with robustscaling (following @tremblapās advice) and it looks quite different now (see video link below), there are much less completely isolated pockets than before.
I made a little test with sounds, and I think the clustering works quite well. Here is an example. Flucoma is great!
Interesting. To me the pockets are sometimes novel and where things of uniqueness get sent by umap, but also a smoother space can be useful. I need to try more robust scaling!
Yes! The point cloud is a single [jit.gl.mesh] that gets the vertices from a matrix that I fill from the dataset (actually that dataset-to-matrix thing is a good idea for an abstraction). That [jit.gl.mesh] lives in a [jit.world] of course, and there is a [jit.gl.camera] that I animate with a [jit.anim.drive]. Here is the part of the patch that does it:
The WASD navigation comes from the [jit.anim.drive @ui_listen 1]. But there are two āmodesā for [jit.gl.camera]. If you always want to look at the center and just rotate around, then use @tripod 1 @locklook 1, but if you want to move around freely (though without a comfortable way to rotate) then @tripod 0 @locklook 0. Also consider using the @speed attribute for the [jit.anim.drive].
Hi again! Sadly, raycasting is not so intuitive in jitter (at least for me) - but I figured out something. Here is an updated āexplore the datasetā example with better navigation, more precise query point, and prettier drawing (also with more sounds - added some percussive ones).
I also made a little example patch, @tremblap suggested that @rodrigo.constanzo might be interested.
Hope that some of it could be useful for the awesome dataset plot abstraction of yours! world-navigation-example.zip (20.2 KB)
Thatās a great idea! I was thinking webgl, but have no clue if itās possible to use flucoma in a client-side js context. (A nodejs or wasm port/build of flucoma would be so nice!)
I ran it all through Emscripten at some point and, happily, it seemed to work, but I didnāt get round to then figuring out what a JS API should look like or how usable it was in practice. Itās on the long list of things to return to one day
I also have to admit that I never used WASM before, but people started to rave about it around half a year ago (or I noticed it then), and since then I plan to try it out. What I vaguely remember is that it should work like just sending function calls from JS into WASM and then taking the results back and doing something with them. (Nothing more concrete though)
Itās terrifyingly good. Emscripten wraps up (almost) all the low level WASM details flawlessly, and makes it it relatively easy either just to send function calls back and forth or interact with C++ classes. Even the bits of our code that have some hand rolled SIMD ported over without too much hassle. My JS chops are much better now than they were when I played around with this, so I feel like Iād be more confident in putting something sensible together: I think the goal would be to treat JS as another āhostā just like Max / SC etc, and make a generic wrapper for it.
Once the codebase has stabilised a bit (major changes incoming, albeit slooooowly), itāll be worth revisiting and Iāll enthusiastically welcome anyone who wants to help get a JS version together!
You could also take the ljudmap approach and embed the flucoma stuff server side and do calls from the client however you see fit. At one point I explored that in flask but obviously that wonāt make it very scalable but works okay in a remote context. A JS wasm API could be super powerful! Maybe something that uses an audio worklet so you can hook things up in a normal graph and you would get comparability with Libs like tone.js for freeee
I donāt have the headspace to fully dive in, but itās quite natural navigating stuff this way (though the gamer in me had to turn the navigation speed waaaaaay up). Having the selection being a radius in āfrontā of the camera is a good solution, as it removes some of the problem of where the (2D) mouse pointer is in the space.
Iām sure this is easy enough to change, but when I have headspace I want to try to tweak it so itās possible to set color and size (as fed by a dimension in the dataset) per-point.
That would be frickinā amazing. Imagine all the projects you could pull together with flucoma.js + webaudio/tone.js + webgl/three.js. The benefit of making it work client side would be crazy - itās not just that you could create fast apps that just run without installation, but you could also make desktop/mobile apps from the webappsā¦
Iām not a big player in the āmaking JS packagesā game, but I would happily help with testing, or making example apps/docs.
Well that would still be a cool feature though. But it would be a pain to program that, I have no idea how I would start. One thing that is still a bit confusing for me about the getviewportray method is that the ray wonāt always go straight āaheadā according to your camera position and orientation. Instead, all rays originate from the camera position, they mesh the point on the plane of the screen and continue in that direction to the end of the world. That means, casting a ray in exactly the middle of the screen is the only point that will give you a ray looking āstraight aheadā. Maybe this could actually be useful for the mouse pointer thing, maybe it would make it difficult, I canāt see that now.
Color - yes! Size - no. Currently the pointcloud is just one big jit.gl.mesh with @draw_mode points. You can set colors per point if you provide a color matrix (and then the indices of the position and color matrices will refer to the same points). But to change the size of points individually is not possible as far as I understand. In the example I posted I faked making the activated points bigger by overlaying another jit.gl.mesh on the original one, that has a bigger @point_size.
To be able to set the size of each point individually, you will have to do what Federico showcased in the Notam Max Meetup: create them one-by-one, most probably in a JS script. But I have a feeling that asking to render thousands of different objects in Jitter will be asking for troubleā¦