Single class classification

dai-nam · April 12, 2024, 10:11am

Hello all,
is there a way in Max to classify incoming sounds into “A” and “Not A”? I followed the " Classifying Sounds with a Neural Network in Max" tutorial (https://www.youtube.com/watch?v=cjk9oHw7PQg).
Based on that, my current approach would be to train the network with sounds from “A” and use a lot of random sounds for “Not A”. But that obviously is a lot of work and probably is error-prone. So can I somehow achieve this with just “A” samples?

tremblap · April 12, 2024, 10:56am

one idea here is to use distances in a kdtree… make your class A, and manually get the knearestdistance and when greated than too much for you, it should not be a?

excluding the whole world is not something I know how to do efficiently

tedmoore · April 15, 2024, 12:55pm

Hi @dai-nam,

Can you share a bit more about what the sounds are and what the use case is? Give us some more ideas to think about!

T

dai-nam · April 15, 2024, 1:21pm

Hi,
sorry for not having answered earlier.
At the moment I’m trying the kdtree method but need to work on it a bit more. What I try to do is to eliminate feedback in the context of a sound installation in which people touch pillars and trigger sounds.

The touching is registered with multiple contact mics on a wire grid mounted on the pillar. Problem is, that the volume of the triggered sounds (or when people are talking loud) also trigger the contact mics.

My idea was to separate between “touch input noise” and “all other noise”, Then. I would record a dozen of “touch samples” so the network knows when somebody actually touches the wall. At the moment, the input threshold needs to be set relatively high, so light touching is not possible without getting feedback.

Obviously, It’s not a real ML scenario, but we have to settle with the sensors we have.

tremblap · April 15, 2024, 3:14pm

sorry you have 3 signals there:

crosstalk from people talking
people touching the pillars
background noise

I’m curious now, if you can share the signal. This sounds to me like a filtered and detrending, or an impact classification. ML can help in the latter.

I’m happy to help, this is a good example for everyone. ‘machine listening’ is an interesting problem one gets better at through practice, so you would be my scales of the week

dai-nam · April 16, 2024, 2:27pm

Thank you that you are happy to help Can you elaborate on the “filtered and detrending” classification?

I don’t know if it can be separated into just 3 signals. Theoretically there is: 1) touching 2) talking 3) feedback from triggeed sounds 4) vibrations of people walking 5) any other bg noise which could certainly be subdivided even further.

In the installation the pillars represent different animals and fur is mounted on them. So the piezos should trigger when people initially touch the pillars but also when they continuously stroke the fur. That’s why impact classification wouldn’t work I guess.

For the moment, I do it as a proof of concept and don’t have the real contact mic signals and real background noise yet. I recorded dummy sounds of scraping over a microphone. Attached is my current project with the samples. As a basis, I used the 2D Corpus explorer example from Youtube. At the top of the patch, I added a button to either load the samples or to record live sounds into the buffer. At the bottom, I calculate the average of the xy-coordinates.

My idea is, to continuously record the microphone input into a short buffer. Continuously, this buffer should get added and removed from the existing plotter (so the live input does not update the pretrained dataset). Then, for each new plot I would calculate the distance from the average and if it’s above a certain threshold, I would label the sound as “touch” or “noise”. I haven’t figured out yet, how to combine the live input with the pretrained corpus. Hopefully, this is understandable.
https://drive.google.com/file/d/1PWs2HRTtMtQMd0sDsAqGuZjDgxL_P_b6/view?usp=sharing

rodrigo.constanzo · April 16, 2024, 2:53pm

This is a bit tangential to what is being discussed, but part of me wants to think that there’s a non-classifier solution here that would likely work.

Either by looking at descriptor data for each “type” of thing that is happening and filtering/removing things that don’t meet a certain criteria and/or combining that with onset detection in similar ways.

If you have some example audio showing what is and isn’t what you want to trigger that would be useful to try to think about the base problem.

tremblap · April 18, 2024, 8:39am

I need a sample of the signal you care about - that single class you’re trying to isolate. Then the context. So I can apply (highly trained ) human listening to it. and then I will know which descriptors to start with. or I’ll have a hunch. There are no magic bullets in machine listening to start with, let alone classification of that feed further down the line.

dai-nam · April 25, 2024, 2:19pm

Hey all,
thank you all for your help. I’m sorry I couldn’t reply earlier since the exhibition opened last week and will run now for 4 months/6 days a week. Nevertheless, I still plan to update the sensor logic because it could improve the overall experience quite a bit. As soon as I have “real life samples” I will share them here. But I need to wait for someone to record them for me. So I will get back to this thread as soon as I have updates. Cheers