Detecting identical or very similar samples in corpus

rodrigo.constanzo · October 17, 2023, 9:42am

@jamesbradbury work with FTIS may be of interest here:

https://phd.jamesbradbury.net/tech/ftis/

This as well:
https://discourse.flucoma.org/t/segmentation-by-clustering

Ultimately you’ll have to decide what criteria (and thresholds) you use for similarity, and I would imagine morphology/duration/time series being important factors especially if you want to remove sounds that sound identical.

When you say the variations in duration, do you mean that there may be a short file that sounds identical to a long file or is that just the lay-of-the-land in the corpus and you will be comparing similar duration files regardless?