Fluid.umap~ crash with "stupid" data

While playing with the help patch of fluid.umap~ I might have found something suspicious. I guess it is not that likely that this is a “severe” issue, but thought I should report it anyway… So I was trying to build an intuition about how umap clusters different data, and I changed the [random 256] in [p making_random_colour_points] to a [drunk 256] and then tried different step sizes and different attributes for umap. Seems like it does not like when all datapoints are the same, as it happened to me with [drunk 256 1] which produces a consistent crash when I fire the fittransform message.

I’m on Windows 10, using the TB-Alpha-08.

Nice catch, I’m investigating… will update accordingly

edit 1: it is not all points equal the problem.

edit 2: it is not happening with 4 dims either… but I just noticed, you cols is 3 and you have 4 dims… that is a known bug - we need to check that people don’t do invalid cols when loading indeed and it seems that you found a way to make one. @weefuzzy I cannot reproduce in SC but I’ll try in Max now.

ok I wonder how you managed to create that dataset… I cannot reproduce it in SuperCollider nor in Max.

Once you have it like this, can you load it and print it and see if you get 3 or 4 cols in the resulting dataset?

I see two suspicious things:

  1. [dict umap.help.data.dict] seems to store changes without prompting a save.
  2. with no changes to the help patch I still get 4 columns - but everything works:

    But that seems to be expected because of this:
    Or am I missing something?
    Interestingly, the dataset prints like so:
    …while the loaded dict looks like:
    Something doesn’t add up. :smiley:

edit: the dict remembering thing (point 1 above) seems to have been a one-time thing, it does not happen after Max is restarted.

edit2: nope, it does remember (ie. if you run the help patch, close everything in Max without quitting the app, then reopen the help patch and look at the dict) the dict contents until you restart Max, which is suspicious.

It basically happens to me every time. Watch this.

edit: and by “it” I mean the 4-dims-but-3-columns thing, not the crash.

ok one thing at a time here :smiley:

  • the corrupted dict is indeed our Max helpfile code problem but we do something healthy and dismiss the extra value :slight_smile:

Now I’m correcting that and I see if I can reproduce your real crash as I filled datasets in SC with similar values and no crash happened. More soon

Sorry, I may have been unclear so added this edit.

Here is how I can make it crash.

it is indeed but then, it is not a real problem here. @weefuzzy there must be someone funny with our dict dirty flags or we might just keep a link to it in the table?

Now I’m trying to crash it à la @balintlaczko

ok got it, without any drunk. just replace [random 256] by [t 128] and boom. I have special builds so I can trace it

1 Like

ha! found it! it is a known bug - standardizing a dataset full of exacttly similar items will give NaNs at the output and in turn, that crashes UMAP. we should check for those in the latter. Nice catch!

1 Like

Yes, just found it out that it will crash with any number and was wondering about the standardization… Shouldn’t a dataset with many identical values standardize to 0 “legally”?

@weefuzzy and @groma will have wise opinions on this. We have a similar behaviour to think about for normalize and robustscale I reckon.

Seems to be OK in sklearn:

1 Like