I’m building a patcher that tracks and translates speech/voice (can be singed) sounds into movement of shapes on a video.
let’s think of it as an audio visualiser but focused on the human voice.
vowels are meant to move one part of the video, consonants are meant to move another…
the vowel part of it reads very well and im managing to get the most noisy and sustained consonants ‘‘s’’ - ‘‘f’’ (tracking the spectral centroid) but when it comes to other (Specially shorter) consonants it doesn’t always work.
anyone can speak of other descriptors or combination of them to help me track all consonants?
thanks ±