This project analyzes how advertising expenditures (TV, Radio, Newspaper) impact product sales using linear regression. The analysis answers key business questions and evaluates different modeling approaches.
Result:
The average expenditure on TV advertising was $147.04
This indicates TV receives the largest share of the advertising budget.
Result:
The correlation between Radio advertising and Sales is 0.35
This moderate positive correlation suggests radio ads have some impact on sales.
Result:
TV advertising has the strongest correlation with sales (0.90)
This strong correlation indicates TV ads are the most effective medium for driving sales.
Results:
- R-squared: 0.906
- Mean Squared Error: 2.90
- Root Mean Squared Error: 1.70
Key Coefficients:
- TV: 0.0545 (most significant)
- Radio: 0.1010
- Newspaper: -0.0043
The model explains 89.9% of sales variance, with TV being the most influential predictor.
Visualization shows strong alignment between predicted and actual sales
Input:
- TV: $200
- Radio: $40
- Newspaper: $50
Predicted Sales: 19.87 units
This helps budget allocation decisions for optimal sales impact.
Result:
Normalization produced identical performance metrics:
- R-squared: 0.906
- MSE: 2.90
Normalization doesn't affect linear regression performance but aids interpretation when features have different scales.
Results:
- R-squared: 0.1097
- MSE: 27.51
Dramatic performance drop confirms TV's critical role in sales prediction.
- TV advertising drives the most sales
- Newspaper ads show negligible impact (negative coefficient)
- The full model explains 90% of sales variation
- Removing TV from predictors makes the model ineffective
- Clone the repository
- Install requirements:
pip install -r requirements.txt - Run the notebook:
jupyter notebook sales_prediction.ipynb
- Python 3.7+
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
sales_prediction.ipynb: Jupyter notebook with full analysisadvertising_sales_data.xlsx: Dataset
