Detecting identical or very similar samples in corpus

Ah right, yeah that makes more sense, and should actually be more straight forward.

The reason I asked about duration as I wasn’t sure if you were saying that a 500ms sample could be “similar” to one that is 20s long (same contour, but just over a longer period). That kind of stuff starts getting a lot hairier.

In terms of durations as low as you’re describing, if you are doing summary statistics then it should be irrelevant as it may be just a couple frames difference in the summary itself.

Beyond that something like the dynamic time warping in this object/thread will be handy, but it’s still in the pre-alpha stages. That shouldn’t be necessary for differences as small as you’re suggesting though.