Weighing descriptors in a query

So one of the things I want to add to my dataset querying going forward is the ability to weigh some descriptors more than others.

In practical terms I have 8d of descriptor data (loudness, deriv of loudness, centroid, deriv of centroid, flatness, deriv of flatness, pitch, pitch confidence), which gives me something like this for any given entry:

-23.432014465332031, 0.591225326061249, 93.79913330078125, -2.59576416015625, -12.845212936401367, -1.006386041641235, 88.705291748046875, 0.166997835040092

Loudness/flatness are in dB and centroid/pitch are in MIDI, with the derivs being whatever they feel like being.

So the core use case here would be to say something like: I want pitch to be 2x more important than other descriptors. Or I want loudness/centroid to be 10x more important than other descriptors etc…

So with this, is the idea that I would scale the incoming descriptors, and then create a dupe of the dataset/kdtree with the relevant columns scaled by the same amount, then do a normal query from there?

Like if I wanted pitch to be 50% more important I would multiply my incoming pitch by 1.5x then dump/scale/re-fit the kdtree with the corresponding column also being multiplied by 1.5x?

That somewhat makes sense, though the mechanics of dumping/scaling are a bit fuzzy at the moment.

But say if I wanted to make some columns not weigh anything at all, would I zero out my input and those columns in the dataset? But if I did that wouldn’t that distance (with loads of zero values) pull a query towards them? I suppose I could just eliminate those columns, but I have logic elsewhere where I can filter by columns, so ideally I would keep the overall column/structure intact.


The core questions are:

  1. in order to weigh a given descriptor, is it as simple as scaling up/down the incoming descriptor and corresponding dataset column by whatever amount you want?
  2. if that much is correct, what is the best way to scale things in this way?

I have to imagine that using one of the scaler objects (normalize/standardize/etc…) would be most efficient since dumping and iterating a dict each time I change the weighting sounds pretty rough, but I don’t want to transform the scaling overall (i.e. I don’t want to normalize or standardize my dataset). As in, I want to keep things in their natural units (dB/MIDI), so just want to scale some of those numbers. Can any of the scaler processes be fudged to do something like that? In this thread from a while back there was some good info on dumping out the scaling factors and massaging them manually, but in that case the units were actually being transformed anyways (robust scaled), so it was just a matter of adjusting that transformation. In this case I just want to apply a multiplier to some columns, straight up.

first you have to compare apples with apple. if your scaling at the moment is as you present it, differences in the hundreds with outweight differences in the units.

1 has to equal 1 in all dimensions. and that is fishy as I explained many times.

once that is true (if ever) then distances will work, and in theory, you can multiply value by 2 to make the distance in that dimension half as important. I showed in many thread that, so feel free to search the archives :smiley:

once you have a convincing 1=1, yes that old thread is a great way to hack and use custom scaling from a dictionary.

At the moment, I have 1 = 1 in the dimensions I care about, derivs are much smaller, but I think that makes sense conceptually as they impact things less in their relative scales.

You lost me here a bit.

So with a dummy example of 2 columns/descriptors (loudness/pitch) where my incoming frame is:
-40 64

And my tiny corpus is:
1: -50 80
2: -45 60
3: -30 90

If I apply 2x scaling to pitch, my input frame becomes -40 128 and the corpus becomes:
1: -50 160
2: -45 120
3: -30 180

Wouldn’t the distance in the pitch column, now being larger numbers, have a larger impact on the match than the smaller distance between the dB values?

I re-read through that thread today and looked over a bunch of old ones, but I don’t remember seeing any examples that applied just a simply multiplier to a dataset, independent of a regular transformation (normalize/standardize/etc…). Are you saying that you can adjust the scaling outputs of (say fluid.normalize~ such that it leaves some columns completely untouched and then only applies a multiplier to other columns?

As in, literally doing something like this to an 8d dataset:
“1x 1x 1x 1x 1x 1x 5x 1x”

nearest neighbourg is euclidian distance in kd-tree. so you you have to make sure your distances are right first. So the 2 concepts explained:

1=1=1: if you have log-centroid in midi-cents and loudness and db, it is the less worse as it considers 1db = 1 semitone. it is not true, but less false.

so with
item-A: -40 50
item-B: -40 49
item-C: -39 50

and you query -39 49, your distance of query-to-a will by 1.414, query-to-b will be 1, query-to-c will be 1. so you have a tie between 2 nearest neighbourg

if you scale a dimension, you skew the distance. the right way would be to scale the actual distance for each dimension before you calculate the euclidian multidimensional distance, which you can’t do at the minute. The 2nd less worse is to scale the actual dimension.

it is bad in both cases, worse in the 2nd case, but the 1=1 assumption is so bad to start with that often people will scale. then you are in real trouble, but the 2nd case is less problematic because your query in and our are scaled by the same.

in our case, in absolute value, let’s say you want to prioritise the first column so you scale the 2nd by 2. suddenly, your query is -39 98 and your q-to-a is 2.34 (sqrt(1^2 + 2^2)) your q-to-b is 1 (sqrt(1^2 + 0^2)), your q-to-c is 2 (sqrt(0^2 + 1^2)). so b is the winner.

does that make sense?

1 Like

yes that is what my example was doing. Note that the scaling needs to happen both for the pre-kdtree-making, and for the querying buffer

1 Like

It’s just the “scaling of the dimensions” is a counterintuitive way to go about doing this, rather than “weighing” them directly.

I’ve just worked out a cool thing in the latest (about to be released) SP-Tools that can do descriptor replacing, which works really well for “forcing” a melodic contour onto incoming data that has none (e.g. drums), so when doing this I want to also weigh the pitch descriptor higher as to influence the matching in a more deliberate way when being used to query a melodic corpus. At the moment it works and sounds good but in a subtle way.

Ah right, this one here. I just saw standardize and assumed it was impacting the standardized projection.

Also, it seems like the way you’re doing the math here the input to this “hack” is actually oriented in a “weight” format. Where 0.5 1.0 is making the first value weigh half as much (by producing an output that is twice as big).

p.s. is this a hint for future changes, or just a general statement?

a general statement. Weighing of dimensions is already dodgy and unprincipled, as I think I’ve made clear here, so adding a feature like this would give, like in other packages, the fake assumption that it is sound, which it isn’t and biased and need exploring and biasing.

To ‘weight’ you need to change the relative values of the distances. hence the inverse relation to multiplier (making distances larger) and the ‘weight’ of that dimension. I hope you appreciate my use of quotation marks around ‘weight’ :smiley:

1 Like

Following on from this, is there a way to massage the values such that you can query the furthest matches in the kdtree?

I basically added a new thing in SP-Tools v0.9 where you can do nearest-neighbor corpus matching, but rather than playing back the sample, the sample is loaded as an IR and the audio is convolved through it (per match/attack), which sounds really cool (vid example with timestamp). I was thinking that it may be cool, in this context, to bring back a sample that is not the closest match, to increase the contrast with regards to the convolution.

Conceptually the “opposite” of a sound is obviously a fuzzy and likely non-perceptual thing (when it comes to numbers in a computer), but presumably the furthest match in a kdtree is a knowable/computable thing, and that could potentially act proxy for that.

This may be more about kdtree querying rather than descriptor weighing/massaging, but wondering if there’s a scaling hack to achieve the desired results and/or if there’s a way to get the kfurthestdist(?!) from fluid.kdtree~.

yes. ask for all of them and read the list backwards. it doesn’t cost anything more IIRC

1 Like

Well how about that! Didn’t think to try -1 for the amount.

I guess the unfortunate downside to this is that it exclusively picks samples along the perimeter (when mousing). I imagine something similar would happen with regards to multidimensional/“real” input where points near the “center” of the space will likely never get picked.

Maybe some fuzzy math where I request all, then randomly pick x amount of samples in the length - (length / 2) range as for something like this it’s not integral (or possible) to have the “opposite” sound, but rather a “pretty distant” sound.

consistent and considerate creative interface is what my jam is :slight_smile:

or use the max distance range and take the last item. that way you know that you are that far.

1 Like

Ok, finally got around to implementing this and after a bunch of testing, it seems like the opposite is the case. Or rather, the opposite of (my understanding of) your explanation.

I understood you to be saying that if I wanted a descriptor (say Pitch) to be more meaningful, I need to scale down the value of it, both in the incoming descriptors and corresponding dataset.

However, when I do this, it doesn’t seem to work. It wasn’t until I started playing with all the ranges and values, that I got it to do what I wanted. It seems like my initial thinking where scaling the numbers up would make them more meaningful appears to be the case.

Here’s the incoming descriptors I’m working with (pre/post scaling):
Screenshot 2023-12-21 at 3.18.30 PM

And here’s the corresponding datasets:

In this case the scaling is absolutely gigantic, applying a std of 0.00001 to Pitch and Pitch Confidence (the last 2 columns in the dataset), so perhaps a less extreme scaling of this is useful, but wanted to get to the bottom of getting it working right.

So did I misunderstand your initial explanation (and patch) or is there something else happening here?

this makes no sense. So there is a bug in the patch. Scaling will made distances and range smaller, so neighbourg more likely.

Make a toy example with the patch and see if you get what you expect. Like 4 items, 2d,

0 0
0 1
1 0
1 1

and then query and scale. if you get what you expect there, then the maths work. I did when I tried it when I did the patch. you can even do it manually. Query 0.1 0.9 as an example.

I’m jumping in a meeting but I’ll do that later if you have caught up :slight_smile:

1 Like

ok the maths were right but the ratio were wrong. inside the Standardize we DIVIDE by the STDDEV not multiply. So your experimentation works. SC example commented below:

Make a dataset with 4 items of 2d:

d = FluidDataSet(s).load(Dictionary.newFrom([\cols, 2, \data, Dictionary.newFrom(4.collect{|i|["item-%".format(i), [i.mod(2),i.div(2)]]}.flatten)]),{d.print})

item-0 = 0 0
item-1 = 1 0
item-2 = 0 1
item-3 = 1 1

make and fit a tree

k = FluidKDTree(s).fit(d)

make and fill a buffer then query nn and their distances

b = Buffer.sendCollection(s,[0.1,0.9],action:{k.kNearest(b,0,{|i|i.postln;k.kNearestDist(b,0,{|j|j.postln})})})

[item-2, item-0, item-3, item-1]
[0.14142137765884, 0.90553849935532, 0.90553849935532, 1.2727922201157]

Make a standardizer with 10x boost of the col 0

w = FluidStandardize(s).load(Dictionary.newFrom([\mean, [0, 0], \std, [10, 1], \cols, 2]),{w.dump})

dest buffer and ds

c = Buffer(s)
e = FluidDataSet(s)

scale the DS


item-0 = 0 0
item-1 = 0.1 0
item-2 = 0 1
item-3 = 0.1 1

scale the query point:


[0.0099999997764826, 0.89999997615814]

re-fit the tree with the new ds




[item-3, item-2, item-1, item-0]
[0.10000002384186, 0.14142137765884, 0.89999997615814, 0.90553849935532]

I hope this helps. I’m sorry my maths and hack didn’t line up before

1 Like

Ok cool, I’ll implement what I have and how I have it.

It now makes me wonder re:derivatives/confidence. In my original thinking I have all the “main” descriptors as dB/MIDI, so roughly at the same scale, but I left the derivatives as they were (typically between -1. and 1. given my time scales) and confidence is 0. to 1. by default. My thinking was that the bigger values would be most important (being bigger numbers) and the smaller numbers (derivatives and confidence) would just add a bit of spice/context to the larger values.

From what you’re describing, does that mean that the derivatives/confidence have been (~)10x more important in my matching than I think they are?

Or more concretely, with a set of descriptors that looks like:

[ -22.516843795776367, 0.133422538638115, 110.849342346191406, -1.829383254051208, -6.321764945983887, 0.150748446583748, 82.0023193359375, 0.161514565348625 ]
[ -22.167551040649414, -0.143307998776436, 109.581504821777344, -0.792484819889069, -6.456767082214355, -0.041584391146898, 96.772483825683594, 0.248626708984375 ]
[ -38.207412719726562, 8.21878719329834, 55.11175537109375, 1.991895437240601, -42.491928100585938, 0.256345480680466, 27.48699951171875, 0.0 ]
[ -36.412986755371094, -0.389389038085938, 48.427070617675781, -0.19626721739769, -50.625186920166016, 0.537291467189789, 27.48699951171875, 0.0 ]
[ -19.572229385375977, 0.995001494884491, 48.497486114501953, 0.690482676029205, -44.006217956542969, 2.671874761581421, 27.48699951171875, 0.0 ]
[ -19.206121444702148, 0.138714477419853, 52.400932312011719, -2.987270355224609, -43.885684967041016, -0.719111740589142, 27.48699951171875, 0.0 ]

[loudness, loudness_deriv, centroid, centroid_deriv, flatness, flatness_deriv, pitch, pitch_confidence]

Are the derivatives/confidence the most important descriptors being matched?

you remember me saying 1=1 everywhere? :wink:

Indeed if your std diff is super small it will make a distance super small and will weigh against the others… so you are back to normalising your ranges. but that is also bad because you lose the relative range of each dimension.

welcome to data science hell :smiley:

you can scale up or down the ‘weight’ of those STDdev dimensions now and see if that changes much.


To clarify this, in my case, is the derivative of loudness having more impact on the distance than loudness itself? (deriv of dB vs dB)

With the scaling stuff in place it’s easy enough to massage those numbers, but just to make sure I understand.

If that’s the case, in my specific case I would just 10x the derivs/confidence to get them in the same ballpark, but perhaps I want them to add only some “spice” to the matching, so 20 or 30x.

It’s hard to test this musically as it’s finding matches either way, and with some of these samples, I can’t really (perceptually) tell if the deriv is having more impact than it should be.

as we discussed above: small distances will have greater impact.

my 2c: if you can’t hear the diff, remove the dim :smiley:

1 Like

Ok, doing some testing in context, but it’s still not adding up.

Here’s an example that’s slightly modified from your original patch that shows what I mean.

I explain it in the vid (with a spoken mistake in range at the start saying “0 to 5”), but in case it’s not clear, I’m generating a dataset where there are 8 columns, somewhat mimicking my actual data. So the values in 1/3/5/7 are random between 0. and 100. and columns 2/4/6/8 are between 0. and 1. I generate 1000 points this way, so I have data that looks like:
Screenshot 2023-12-22 at 9.23.56 PM

I then generate a random “input” point where it mirrors the same pattern but the 7th column is always 69. Like so:
Screenshot 2023-12-22 at 9.24.35 PM

I then scale the 7th column up/down via the fluid.standardize~ hack and when I have large values in the 7th column (e.g. “9.40 0.80 38.88 0.10 88.76 391.20 0.37”) for both the dataset and the incoming values, it matches that column much more closely.

If I do the opposite and have tiny values in the 7th column, I effectively get random matches for that column since it is weighing very little in the distance matching.

Here’s a vid showing what I mean: