This is damn cool. I was unconvinced when she was demonstrating the python code with output, but the poetry toward the end was definitely intriguing. I particularly enjoyed the random walk through phonetic similarity space - I see this is as something I’d have been really interested in about a year ago writing a piece with primarily sounds from vocal sources.
Yeah that was my favorite bit. It’s a shame she didn’t have the text on screen. I would have liked to read it (plus I’m not much for this kind of bland poetry reading).
“Her breasts, the hands on her breasts, her hands, her breasts, the breasts, her hands, her breasts on the hands on her breasts…”.
edit; in actually thinking about it now. I wonder if the dimensionality reduction techniques they talk about could be useful to draw on. So taking a vector space of 800+ dimensions, in some cases, and then reducing that down to 50ish dimensions, to encompass the “meaning” or “sound” of a word.
Could see easy parallels for having a really complex descriptor/feature space being analyzed across a wide range of material, and then reducing that down to something more useful and/or comparable (though probably “illegible” like MFCCs).
I was away for a while… Too much work here. I never forgot Flucoma though and maybe … you !!!
A little messy thought:
That thread makes me think of something I have been exploring since a while. Open Music people may know the ZL library using Lempel–Ziv; the ZIP algorithm in quick words. It is used in OM both for notes or text and I love its ease of control.
Nice @rodrigo.constanzo! I also believe sequencing can sometimes be missing for time-based stuff. This is why a markovian approach is sometimes easier for me; but not always.
I love the fact LZW separates analyzed sequences into a dictionary (would be a cluster for a SOM) on one side and a sequence on the other. Super cool: their size can independently and dynamically change. So you can ask for large bits of sequence generated from short segments etc.
The fact sequencing is added to the clustering provides something better than just random walks. Better means more control; maybe not perceptively…
Using RNN works fine too (demos of Tensorflow) but I never had the chance to go really far with that. I may be wrong but its control may not be easy.
So why not adding dimensions to all that and play with horizontal and vertical sequences of MFCCs. 800+1 dimensions; the extra ones being time ? or 800+2 is one has two representations of time…
I hope I will have that extra time to dig further into something like the “continuator”, “OMax” or “PyOracle”.
To be continued. ;9
what I was thinking on my part was in the chaining of higher level ‘phonemes’ to build a dictionary… it is not clear yet, but imagine if larger chunks (‘words’) were thought of as some classes (noun, verb, etc)… or something more ‘machine learny’ like they do where a sort of vector of transitions between each class, or even each ‘word’… not clear yet as you can see but once I get chunks of valid gestalt for me I want to be able to think of their chaining. I know @weefuzzy has done some cool stuff in that direction already, and @groma had also many ideas…
One might also take influence from strategies for creating conlangs as they have lots of similar requirements such as constructing a grammar, designing a phonology etc. I had a friend who made his own radio show in his and it was weirdly convincing that it was a language people used day to day. If I can find the link I will
You could hack a speech recognition engine and “reverse” its use. I was using this for my stuff and Georges Aperghis.
For instance, a long time ago using the now dead Sphinx library, I made an mxj (Java within Max) retrieving the position of phonemes recorded into a buffer~. The engine was using an n-gram file (often 3-gram) together with a language specific phoneme dictionary. Sphinx was providing several languages.
It was possible to either use the whole dictionary (not great) or to narrow the search using a JSpeech Grammar Format text file containing a prob tree of the text to be recognized (super great).
The external was giving lists of the recognized text, the position of words and phonemes (!!!) in the buffer~. You could then concatenate then ala Aperghis etc.
I do not know if that still works. There are much better engines now worth being implemented into Max.
A dinosaur from the permafrost I would REALLY LOVE to wake and implement into Flucoma… In another life.