Detecting identical or very similar samples in corpus

@jamesbradbury work with FTIS may be of interest here:

This as well:

Ultimately you’ll have to decide what criteria (and thresholds) you use for similarity, and I would imagine morphology/duration/time series being important factors especially if you want to remove sounds that sound identical.

When you say the variations in duration, do you mean that there may be a short file that sounds identical to a long file or is that just the lay-of-the-land in the corpus and you will be comparing similar duration files regardless?