Computer-assisted corpus exploration presentation

jamesbradbury · October 18, 2020, 7:40pm

Hi everyone,

As part of the joint conference on AI Music Creativity I’m presenting a short paper and demonstration of software used for computer assisted corpus exploration.

There are many great papers, and @alicee is one of the key notes! I will be presenting briefly on the Thursday 22nd morning session BST time. If anyone is interested in the paper I am happy to send along a copy, and if you can spare the zoom fatigue please do come and watch and ask questions

b.hackbarth · October 22, 2020, 6:40pm

Sorry that I missed this! Please send the paper as I’d be interested in taking a look.

jamesbradbury · October 23, 2020, 11:37am

Here is the video (excuse the sloppy editing): https://youtu.be/oZOMLMdM_1Q

and the paper: https://boblsturm.github.io/aimusic2020/papers/CSMC__MuMe_2020_paper_6.pdf

b.hackbarth · November 17, 2020, 11:27am

Thanks for the paper James! An interesting read.

I’m a descriptor statistics newbie, but I’ve tried implementing something similar to what you’re doing in audioguide (mfcc min/max/mean/std/kurtosis/skew as well as for delta) and get pretty good results with UMAP scaling. Better results that I was getting just using averaged mfcc values.

A question for you/those in the know. In this approach I assume that all descriptor statistics are weighted evenly for dimensional scaling. E.g. that mfcc1-mean is as important as mfcc1-min. However have you played around with the idea of weighting statistics differently to preference certain ones over others? My intuition is that an average mfcc value should be more important than, for instance, the minimum of the first order difference.

jamesbradbury · November 17, 2020, 11:52am

Thanks for having a read! I’m not great at statistics or even a numbers person really. My approach with narrow curation of statistics is to ‘let the machine figure it out’ which is where dimension reduction comes in. There is always the problem of sausage machining your numbers though which makes the values themselves fairly meaningless, unless its PCA where its standard deviations.

In my approach everything is evenly weighted before it hits the UMAP part, but now that you’ve raised the question - I think that it’s likely to produce better results if certain stats are heavily weighted over others. Min/max might actually be misleading, especially if those values are derived from a single frame from thousands for example. An approach would be time weighting maxima/minima, or an average/median of the maximums. As you’ve demonstrated before in ag, you could also do content-aware weighting with the loudness of the frame etc to get rid of mostly silent stuff. Actually, that really simple code snippet you posted somewhere else on the forum showing weighting with numpy inspired me to put post/pre hooks into FTIS for each stage of analysis, allowing you to do custom weighting.

tremblap · November 17, 2020, 11:56am

This is what I’m trying to find out for the last year I’m exploring the balance between curating the number of descriptors, their respective weights, their redundancies in a given corpus and query, and their scale (lin or log with whitening and other preprocessing) and I cannot get generalised answers… so I embrace the mess and explore for each query until I will get to something inspiring at that moment with that query… if I get anything more generalisable, I’ll share but for now, it is quite messy…

tremblap · November 17, 2020, 12:03pm

@b.hackbarth I should add that @groma used very convincinly mean-stddev-min-max and the same on 1st deriv on his FluidCorpusMap paper, which is python code so you can have a peak in the normalisation that he is doing in there

b.hackbarth · November 17, 2020, 12:19pm

Thanks pa. You mean this one: https://www.nime.org/proceedings/2019/nime2019_paper060.pdf ?

It seems like, according to 3.4 Summarization and 3.5 Dimensionality reduction, no weighting of features is done before data reduction.

b.hackbarth · November 17, 2020, 12:28pm

+1. I was also thinking that this could become an issue with samples with differing amounts of background noise/tape hiss/etc.

Right back at you. audioguide 1.5 is out and, inspired by your paper (and others), it includes new descriptor types for things like centroid-minseg and noisiness-stdseg. I also added a ‘-stats’ suffix which is automatically translated into a combination of min/max/std/mean/kurt/skew. AudioGuide - Documentation

You can now do things like spass(‘closest’, d(‘mfccs-stats’), d(‘mfccs-delta-stats’)) to get the same feature arrays you are working with in FTIS.

Also, a very crude API is up online and working https://github.com/benhackbarth/audioguide/blob/master/apiExample.py

tremblap · November 17, 2020, 1:14pm

I’ll ask him later if he doesn’t jump on it here as I really, really worked on this in the last few weeks and I keep bumping into new ‘best practices’ that are too situated… it is very much in my head

jamesbradbury · November 18, 2020, 2:19pm

Awesome! I’ll take a look because my stats implementation is a tangley code blob that needs redoing. As a result its slow, and everytime I look at it I forget whats going on.

I’ve been keeping my eye on this, not becauase it doesn’t look mature or nice but just because my main time sink right now is writing my thesis I’ve got a major version incoming for FTIS as part of freezing that for submission but then I’m totally unhinging it again and working on more experimental branches. AudioGuide is first on the list to get incorporated, as having my way of reasoning around it would make it potentially central to my practice.