Predicting Boston House Prices with Machine Learning

This project aims to develop a Machine Learning algorithm to predict the average price of houses in Boston.

Dataset

The data for this project was extracted from the Kaggle dataset. It contains pertinent information about various features of houses in Boston, such as the number of rooms, neighborhood crime rate, proximity to urban centers, and more.

Project Overview

Real estate prices in Boston can be influenced by a multitude of factors, from location and size to local market trends. In this project, I utilized supervised regression learning, a technique where a model learns from labeled data and analyzes the relationship between variables for accurate predictions. The goal is to model this relationship and make precise estimations based on the data.

Exploratory Data Analysis (EDA)

Dataset Overview

The dataset contains the following columns:

RM: Average number of rooms per house in the neighborhood.
LSTAT: Percentage of homeowners in the neighborhood considered of "lower class."
PTRATIO: Ratio of students to teachers in schools in the neighborhood.
MEDV: Median value of owner-occupied homes (target variable).

Dataset

df.shape
# Output: (489, 4)

df.dtypes
# Output:
# RM         float64
# LSTAT      float64
# PTRATIO    float64
# MEDV       float64

Statistical Summary

df.describe()

Data Exploration and Preprocessing

1. Statistical Analysis:

Summary statistics for all features were analyzed.
Relationships between features were visualized using scatter plots and heatmaps.

2. Outlier Analysis:

Boxplots were created for each feature to detect outliers.

3. Normality Tests:

Conducted Shapiro-Wilk and Lilliefors tests for normality.
QQ plots and histograms were generated for visual analysis.

4. Correlation:

Pearson, Spearman, and Kendall correlation methods were applied to study feature relationships.

Models Developed

1. Simple Linear Regression

Example: Predicting house price (MEDV) using the average number of rooms (RM).
Performance: R²: Train = 0.57, Test = 0.60 RMSE: ~99,315

2. Multiple Linear Regression

Features: RM, LSTAT, PTRATIO
Equation: MEDV = Intercept + (Coefficient_RM × RM) + (Coefficient_LSTAT × LSTAT) + (Coefficient_PTRATIO × PTRATIO) Performance: R²: Train = 0.73, Test = 0.68 RMSE: ~96,087

3. Polynomial Regression

Applied a polynomial transformation to model non-linear relationships.
Example: Quadratic regression on RM.

4. Cross-Validation

Used 15-fold cross-validation to assess model generalization.
Best R²: ~69.25% (Multiple Linear Regression).

Key Results

1. Simple Linear Regression:

Moderate fit; suitable for understanding single variable impacts.

2. Multiple Linear Regression:

Improved accuracy by incorporating multiple features.

3. Polynomial Regression:

Captured non-linear trends but increased computational complexity.

Tools and Libraries

Python: Core programming language.
Libraries:
- NumPy, Pandas for data handling.
- Matplotlib, Seaborn, Plotly for visualization.
- Scikit-learn for machine learning algorithms.
- SciPy, Statsmodels for statistical analysis.

Conclusion

The models demonstrate that house prices in Boston are influenced by various factors, with multiple regression providing the most accurate predictions. Further refinements can include feature engineering, hyperparameter tuning, and exploring advanced models like Random Forest or Gradient Boosting.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
housing.csv		housing.csv
regressao.ipynb		regressao.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting Boston House Prices with Machine Learning

Dataset

Project Overview

Exploratory Data Analysis (EDA)

Dataset Overview

Dataset

Statistical Summary

Data Exploration and Preprocessing

Models Developed

Key Results

Tools and Libraries

Conclusion

About

Uh oh!

Releases

Packages

Languages

sofiasawczenko/forecasting_house_price

Folders and files

Latest commit

History

Repository files navigation

Predicting Boston House Prices with Machine Learning

Dataset

Project Overview

Exploratory Data Analysis (EDA)

Dataset Overview

Dataset

Statistical Summary

Data Exploration and Preprocessing

Models Developed

Key Results

Tools and Libraries

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages