Skip to content

scikit-learn has removed Boston data set #11

@jtniehof

Description

@jtniehof

The Boston house pricing data set was removed from scikit-learn. Trying to import load_boston as in the chapter 2 examples notes it was removed in 1.2, citing this article on problems with the data set. This is not noted in the scikit-learn changelog as far as I can tell. (ETA: Its deprecation in 1.0, September 2021, was noted in that changelog).

Unfortunately the California dataset doesn't work as a direct drop-in, having 7 features instead of 13. The Ames dataset has 80(!) features, which is a lot more interesting than I usually give Ames credit for.

I'm not sure of the best path forward, but probably the most expedient is to implement the workaround for pulling the Boston data from the source and patch the feature names back in, as annoying as it is to continue use of it. Otherwise adapting to California is probably workable (but diverges from the text.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions