-
Notifications
You must be signed in to change notification settings - Fork 7
Description
I recently stumbled upon the alternative clustering method HDBSCAN. They promise the following:
Performs DBSCAN over varying epsilon values and integrates the result to find a clustering that gives the best stability over epsilon. This allows HDBSCAN to find clusters of varying densities (unlike DBSCAN), and be more robust to parameter selection.
In practice this means that HDBSCAN returns a good clustering straight away with little or no parameter tuning -- and the primary parameter, minimum cluster size, is intuitive and easy to select.
And also:
In particular performance on low dimensional data is better than sklearn's DBSCAN
Not only this, but it seems to be basically a drop-in replacement of DBSCAN which we currently use, so this could be quite interesting to explore to make RGDR more robust as well as perform better.