Machine learning project for predicting used car prices based on technical specifications and market data. The goal is to build and compare several regression models and achieve RMSE < 2500.
Cars can have very different prices depending on mileage, age, engine power, brand, and other parameters. This project demonstrates how ML models can be applied to predict car prices and help buyers or sellers estimate the fair market value.
Workflow:
- Data preprocessing – cleaning, handling categorical and numerical features, scaling & encoding with ColumnTransformer.
- Model training – testing multiple regressors:
- Linear Regression & Ridge
- Decision Tree & Random Forest
- LightGBM (gradient boosting)
- Hyperparameter tuning with GridSearchCV inside Pipeline (no data leakage).
- Model evaluation – comparing performance with RMSE metric, analyzing training and prediction time.
- Python 3.13
- NumPy
- Pandas
- Matplotlib
- Seaborn
- Scikit-learn
- LightGBM
- Best performing model: LightGBM
- Achieved RMSE < 2500 (target reached)
- Compared training time vs prediction speed across models
Clone the repo: git clone https://github.com/artemxdata/Car-Price-Prediction.git cd car-price-prediction
- python -m venv venv
- source venv/bin/activate # Linux/Mac
- venv\Scripts\activate # Windows
pip install -r requirements.txt
jupyter notebook "Car Price Prediction.ipynb"
├── .gitignore # Git ignore file
├── Car Price Prediction.ipynb # Main notebook with full ML pipeline
├── LICENSE # Project license
├── README.md # Project description and instructions
└── requirements.txt # Minimal dependencies
- Add more feature engineering (engine volume, region, condition, etc.)
- Try additional boosting models (XGBoost, CatBoost)
- Deploy as a simple web app for interactive car valuation
Made for educational and practical purposes.