NMF-based speech V/UV decomposition => transients slicing

As requested by some of you earlier today, here is my quick and dirty attempt with NFM-based separation on speech sounds ( I dropped that sound in the patch, typical speech: https://freesound.org/people/Speedenza/sounds/167554 ), which gives very convincing results when it comes to get smooth voiced sounds on one side and all the noises / fricatives / breathes on the other ( sounds better, to my own opinion, than the strategy based on pitch tracking, e.g. gbr.psy~ where it is an abrupt binary decision on a frame-by-frame basis ). Then the unvoiced contents is chopped by the transient slicer and the slices are sequenced randomly.


----------begin_max5_patcher----------
3274.3oc2cs0bihbE9Y6eEcTs6KarcnuBjmbxef7PdLUJWHTKYFi.E.M1d1Z
8u8zzcirjsnUOVmlY7NUMx5Bvguy8yoO.+9kWLad8Sx1Yn+N5+ft3he+xKtP
+U8ewE1Oewr0YOkWl0p2rYcxm5lusqqtZ1Ule99hpN8OM7EaZjsxptrth5p6
Zj4cliuHhcSzUHRL6Ft5OQ5OndE8es6W+gNutrtwr8Q2PYoQjqNxav6sWyWs
29f6OnQ2DmjPn792whi3IIGtG0a6JkccOuQZ1I0It4+ZfLrUscOWJO.WUaWW
To1SMi.O.1rtbEGX0d.MNQiMNVchbbjVrPefqm+kqSiFN7Kqq5pxVan4+noH
qD8OqKWL7yEqppUznrH+g8ou5jxfG8YE09sM0aqVH0TQzS9gy0F0wuS1bmrJ
atAcC+VobUV9yGHBv6IWzmTzY8eweb4k8ubkmJL40qWqzFl81CVYQamrB0Ui
5tWhVtsrDss5q0E4xcP9iKBRhn6KBvLhSY.d1Q4lQe.39wsORiGNYug7maCD
drV5vniAz8jMIIedrOHPZez1k0zosMZUXR8p7+sUVkq3jmu4AOUrm..yDNk.
ovYcnNBykMybomMOSgPHTxXQ53LbiUE2nkQFAiwGGiDWZFeD7KeTQt2Iq+qH
7n7juGaNhC1QhnmcfIIlfRoZlyXrC9wYG3O.jOzY36fGzh6HaPWStFrQvGCN
7MhHsAU8uV9u6sbaGWc2WbG4RtltOtEoNUywgG2YUKpWivQAWg1JvIBCvSbB
bRvAt7oMMpfIQ+seo.e9Rb7osj4DMviicA73joRhGOUdvTkQnE3Xm3NN33VE
OnoVc5DAiusSqpiisp5bmHGPeaikaRibgIs8NkvuMuPsM5rTTm1nGK5tGsnX
4RYS+WuKxY64m1BAaBdEiwZS+DmY0GG8yPV8XBVKtnj3gS1+jlUONwXX5UZ8
hOQk8hgLs9EM0aPYn1MRY98nkEkRz8JKE.LNhh2m8ioTm7+X3LNVKaayVIOh
WhMkYpJW9E.hG5v6HU6EHVGbP3LXnfBmqwd4XuzyYsLuBOekhFvXd0VWFibC
+XnAOh+MBbg39sWPjaFEgsJayrRHDgwLcRr7DrVElJbIECeNrsuO28yvtTX.
ENBqa7Yry7VRYvYWNlKnqu9ZTlRx8baQKpdoo+a8NhTAB5SnAUl8rJdMRscm
O7wwlnBXa7OL1Ym3nvA+kk0pCxjzqAAWCNVJdrdMrrtYcldqEGISE7OztOLU
V4Blsu3bOrxE.lFaW8pUN7TCUWMMMYlgMo+3rADhwbhMMRbSoKhonzESQaIQ
Zlh6JW3SfmcYI5Df9JXxJINR2IFZpOJ6wjfGJuMa8l1t50subJK8qztNy5.r
qbTlox0XmcmRjFbtPGpDU5YJamQ3fHCpSMg93It50pHI3nVUZ5FY0BjdIEBZ
uort3YphP5QdhybcDhfau+k5hJDAcaWSwpU84zD1RQFfON0lxmS+cBdvw+e4
ZTz3g1+9LyImTge.2Tmhcb3a992JQqZp2tAQfwV+zPmZW7VA0IzESfGtBTgy
Tc.bcbssf2lhOyo1NIN3XeJVFMKjS8YYzngOglMR4CuXVgXs28VfL1omNlNg
Xcx4bkUogekW1npW8EjdzQt1T55MDGZBullCXk+Yc6Khsk55LYONCNCAWk3u
Rg59Fx2tQch1jUhxUaYScwBjpXTjLK+dSg9.UmepcgWowZSBqyfwpyO7cejJ
BZndr0CPZpFntGu.b3WMJkzNqqqAoyye1oZ4ODZ8DhIZepPqtm3Nx2HRbJjQ
9lilexZ6fxleG5wwlJbbF6iOAw8mRziSMI3GS4dfd.y5Yhlljc3KxDf2Yybn
.lG+XdyvQwpLLEjXNNIr90n78spScGbO7ox2Ur1w.iYRpAnT4IQltTjxzKuL
24ZIQArr8dG2aKfqv79imYQpjyq2VkuScxgerDyxTicGHiEAm.OXndS1hE6M
ljt.MMwLJMtAc7m.P29f7QePr.6ChS+Df3t6ajs2OOK+AG3llDY67HwCbyi9
zf6kOtvCX2aU6ArIeBf8iEUKpers3aRevsWF1bZ3CbuooVUZd6aKO0mJ1goL
EahLbcq2wINCpwCeGImusexv9dZYAjIsaWK9zTcGJOQ+Y4guacKK2Vr3FEOQ
OHc8yQmlu7B51gP2HFCGMY46ln9qtgVNmlV9mg.h0MKdMwQGQD8LxPxm.HOu
rN+Ae8P5Uh.7zOcWH.CyRTThGWH.XRR3CAzJ6di6+qfZo37Xlpn14uwcM5XB
+GReZmvtzNvP3FECLN1MCI7iW2PrPMKY5hBZmtPyBW4dI4YgODX+HYjNIijw
PPNA1r.FN6ROK7KKetJ.emrAE4.+1Er6iLp17SyIrSmi6QRjMhmAVnxRdRRH
VP0SiYhSuhzv229hptPulkCPlwMY04bMKIg+BFpcSiBYKQGVZzBYadSwltZU
Xgecgxn3Zr5ELx75day0saW+1OOPixhJo1rZeVRPlzkcL0Dy0hB0LtW7Qhlv
mhQaaR8hxLITvItzlvzIYgOJNIxg5ZbgamvEiTm5dU+vAG7a+VQ+.ai8aget
BNWJV9.KhaRxz4EeIbK8m7aKxxeY14O0NwZImvbgjYVqZ1H1tiX5t6JHpNue
7nCRSRdqWOrCIM7MKgEYGcUauzbmoHc5ZVRd85M0sxWlr1hvEldDD4rsHLxO
GgN+oKVIgDqG+aN1LIrFmFiEqjE9olZmlT05kuft895M8cMAQ3BzsOVTo+.N
hvP2tbYm4mhXInaKToqqu.Oa6KeR8qEkpuQcbdGO29Sa2rHqS1OApMYUOn+q
r84pt6eceLUBeTgBcpzuE1QTN0beYIcjrgSCtfYBlfucgv4lAUx8D7E9QTVs
MEqcbcJ8AFYSWnGqq5kR08408zpRCeGP1MntN5wMXItZgtYjNhcl5FgEbnqB
WuvjUVnMxsp3TyUHqabSmrPX5q.Qkm0C5G3utHnWdBVU.hIsu96mBt3EgObt
I88oN48TxdKEHyoK.BFtb2Gq2OpPr5K2sVSHWUP3qP6kQCprd0c6Fc00xEEY
UgsCQV6EgcRucavDEbkj83Eimwa+cSOHsPRzon41.AmLwKedPE6VqByra6ts
fgOofM0kOeXQfGllXPuZl2U3Gwbamg5bgRXtu1kay9pbwcpMTc3uqekJKlus
ybKG8hc7jKlIWO2be6POB3FtXv3k6YQMsLTltmyXg6lN+yNCMnUOYc5apWBG
88UvDIz4RYcRXVzbpIB5X0w5Q5C5cSCg2bO3US59u+PoSa81l7ALMbeKD8J0
UbvthJMOduMhX2nipA3Mk3dPo961XmOkXdPo9anamOkH9PIP3dXewD9boj5X
PNAg56J+YCoAYsaEBF.PJ1Gk7A8yyiRdo5wffRVYsSJ0eABe9bOeT8DPHm5u
98OIl5uDXOatW+0LuebOPnzo3dXHzHDBOnDGDJ4iqbAHxIermN7rwL1dilMM
3TF+dJGcdTl5AkYfvcw95AFd1qOjliONoOS9qOY7..2sezGOMDgH4.dxjQIu
RWDB+lb9jgIluw2NaJQmLLQlLJ4kODPnTzTgI1jY4x7wxkBBkhmLLI7I2JHH
jOtHXPTQIyGyIFDt8X9XNwfH7DaxLmn9XNQgf6Q8xbBhRUn9XNM.7yiR9juB
FCAk7wdh.gkK0G6IAHbOermnPXO0uX2mzsGEBtGwqRJgvdR23O7Tv8HdYOAg
VNwqxj4PQoSx8fH9Dg6KkvPPoSp6AgkKwmzxoQPPIeRKm.Bk7wuGFDMBrGsw
k.hkazjgIeDSP3h.6SBKorPzbBrOs.kKlJPhiOdGtNul+f8I.FAhJtw93rW.
gxIlOUIti8wwXLHTh5gxHCCEkNoFAHxI+xdBjdXg8KZIHgVv94GFBJ40RQBE
gv9nT.uGJeH8aLOfJBfOxQADK.K9G0ZqfgYoULC+P1lMeU1zZ2aMMmsN6KlG
EOwWo+XQk4i54geVi7qECau9ZbcVVS98Ecx7tsMlw53I6SOfYqqUDtZagEzJ
zdoc7Ody.Wr6IPz5rmD8SEi9LWAtkYaK6Njg79G8Ru9RzdOqV18TE5thpdLJ
Gd.Oo2tgW1+QzTVd+fNdvCCpj9MKglRvB8i3onzjHlcuLLyqrBu9G2VucxVl
sVc1WLv42WXe.f0y9hKHevoDiP4jT8ykpXBmoeGIlwwPeZcvPAd7Srued77U
KKJK2An8mYpgAFZ1plrEEudeU0he7NYhP8u3qN561iT1ciLrau93458u6861
vdQRivoZYO0J66em5qvuYuxpVYXxj3cO6q5erjUuotYv5Tsqo619sc06.56F
TriK9Nz5vOonY1m7V6RKyX7gG8YLElUmyeTMK097GW9+AGLpnG.
-----------end_max5_patcher-----------

This is incredible! I have a recommendation, that @weefuzzy might be able to confirm and reinforce, or teach us both how to improve :wink:

  • You are processing in 10 ranks the full file to separate the one rank you will then use. This is sub-optimal, since you run twice the nmf on the full buffer. What my patch was doing was to do the first decomposition in 10 for only 2 seconds to save time and then use the dictionaries as templates for the full buffer. So there are 2 ways forward with your patch:
  1. Do like I did, in your case process voice 0 88200 would be the first to use. I get similar results that way, with more segments though.

  2. Run the full decomposition in 10 ranks first like you do, then use fluid.bufcompose~ to sum the time-domain reconstructions of the ranks you want as ‘voiced’ instead of re-running a full fluid.bufnmf~ on it.

I hope this makes sense! If not, send questions along

pa