Skip to content

Explore HDBSCAN as a replacement for DBSCAN in RGDR #136

@BSchilperoort

Description

@BSchilperoort

I recently stumbled upon the alternative clustering method HDBSCAN. They promise the following:

Performs DBSCAN over varying epsilon values and integrates the result to find a clustering that gives the best stability over epsilon. This allows HDBSCAN to find clusters of varying densities (unlike DBSCAN), and be more robust to parameter selection.

In practice this means that HDBSCAN returns a good clustering straight away with little or no parameter tuning -- and the primary parameter, minimum cluster size, is intuitive and easy to select.

And also:

In particular performance on low dimensional data is better than sklearn's DBSCAN

Not only this, but it seems to be basically a drop-in replacement of DBSCAN which we currently use, so this could be quite interesting to explore to make RGDR more robust as well as perform better.

Metadata

Metadata

Assignees

No one assigned

    Labels

    RDGRIssues relating to the RGDR moduleenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions