This project applies machine learning techniques to predict NBA draft outcomes. The goal is to use data-driven models to forecast which players will be drafted and their subsequent performance in the NBA.
- Introduction
- Motivation
- Data Extraction
- Data Cleaning
- Feature Engineering
- Exploratory Data Analysis (EDA)
- Data Transformation
- Model Selection and Training
- Model Evaluation and Interpretation
- Model Deployment
- SHAP Analysis
- Real-World Application
- Challenges and Solutions
- Final Thoughts
This project aims to utilize machine learning to predict NBA draft outcomes, enhancing team decision-making processes and improving roster-building strategies.
Accurate predictions help teams assess rookie potential, ensuring long-term competitiveness.
Data analysis enables more effective identification of promising rookies and improves draft success rates.
As a data analysis enthusiast, this project provides a deep dive into the patterns and trends within NBA drafts.
- Connect to SQLite Database
- Extract data from relevant tables: player attributes, team salaries, player salaries, draft, draft combine, and game data.
- Merge draft and draft combine data tables.
- Remove unnecessary columns containing 'set' and 'location'.
- Fill Missing Values
- Use mode for categorical data and mean for numerical data.
- Remove Irrelevant Data
- Eliminate redundant rows and columns.
- Verify Data Integrity
- Ensure the dataset's integrity post-cleaning.
- Select Relevant Features
- Based on domain knowledge.
- Create New Columns
- Capture position data.
- Calculate Additional Metrics
- E.g., BMI.
- Summary Statistics
- Show distributions of key features.
- Target Variable Analysis
- Analyze the distribution of drafted vs. undrafted players.
- Visualize Relationships
- Correlation and feature relationship visualizations.
- Normalize/Scale Numeric Features
- Encode Categorical Features
- Use one-hot encoding.
- Split Data
- Into training, validation, and test sets.
- Select Multiple Models
- Logistic regression, decision trees, random forests, SVM, KNN, gradient boosting, XGBoost.
- Train Models
- Using the training dataset.
- Evaluate Models
- Metrics: accuracy, precision, recall, F1-score, ROC-AUC, and specificity.
- Compare Models
- Based on key metrics.
- Select Best Model
- Highest recall preferred.
- Feature Importance Analysis
- Identify key predictive factors.
- Save Best-Performing Model
- Real-Time Predictions
- Load and use the model for predictions.
- Generate SHAP Values
- Explain model predictions.
- Data Collection
- Gather actual data for new players.
- Feature Engineering
- Standardize and engineer features.
- Predict Draft Position
- Use the model and SHAP analysis for predictions.
- Missing Data
- Apply imputation techniques.
- Model Generalization
- Use cross-validation to avoid overfitting.
- Balancing Metrics
- Focus on optimizing recall for better prediction accuracy.
This project demonstrates the power of data analysis and machine learning in transforming NBA draft predictions. Continuous learning and adaptation are crucial for success in rapidly evolving fields.
Data Download For those interested in exploring the data used in this project, you can download the dataset provided by one of the authors from Kaggle. Click here to access the database: https://www.kaggle.com/code/edwinstanzah/nba-draft-prediction-part-1-getting-the-data/input?select=basketball.sqlite
Thank you for reviewing this project. For more details, please refer to the slides included.