And actually, a related, but not different enough to start another thread, follow up question.
What is the intended (?) way to process/separate datasets based on information in labelsets?
More specifically:
- I have 5 sounds I want to use to create classes with my drums. I train/create the classes.
- I want to break a corpus into 5 clusters, such that each cluster corresponds with a class.
- I want to put each cluster in a different dataset (or more specifically, a different kdtree) so that I can search for the nearest neighbor within each cluster.
If I’m understanding things correctly, I need to use the labelset generated by kmeans to break the corpus/dataset into 5 separate datasets, which will each then be fit to a kdtree. So on input/analysis, I figure out what class the sound is, then once that’s determined I pass the analysis off to the relevant kdtree to find the nearest match.
So that would require me using a labelset to break apart the dataset, which leaves me in a similar predicament as above (having to data munge both the labelset and dataset).
In an ideal world the cluster/label would just be another column in the dataset that would be used for filtering(/biasing) but not distance matching. Or perhaps for distance matching with an absolute condition (cluster==5, radius==0.1, etc...
). But that’s not really the paradigm.
For a small/finite amount of clusters it’s possible to create dataset forks, though this obviously gets complicated if you want an arbitrary or dynamic amount of clusters (and corresponding datasets), but that’s putting the cart ahead of the horse.
Am I simply not understanding what/how a labelset is for?