Skip to content

πŸ“Š Predicting telecom customer churn to enable targeted retention campaigns β€” XGBoost & feature engineering.

Notifications You must be signed in to change notification settings

SantiagoPulidoH/Telecom_Churn_Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“‰ Telecom Customer Churn Prediction β€” Interconnect

TL;DR
Binary classification project to predict customer churn for Interconnect (telecom). The goal is to identify at-risk customers so marketing can run targeted retention offers (discounts, special plans). This repo contains the analysis notebook, modelling experiments and reproducible instructions. Data is not included for privacy reasons β€” see data/README.md.


πŸ” Problem statement

Churn prediction helps reduce voluntary customer cancellations and increase lifetime value. We frame the problem as a supervised classification task: given customer attributes, services and contract details, predict whether a customer will churn in the next billing period.

Business goal: maximize recall/precision on the top decile (Precision@K) to prioritise retention budget efficiently.


πŸ“¦ Dataset (not included)

Files provided by Interconnect (example names):

  • contract.csv β€” contract length, monthly charges, payment method, contract start.
  • personal.csv β€” demographics, tenure, region.
  • internet.csv β€” internet service type (DSL/fibre), add-ons (ProteccionDeDispositivo, SeguridadEnLinea).
  • phone.csv β€” phone service usage, multiple lines.

Each file has customerID as the unique key. A small anonymized sample is available under data/sample/ for testing.


🧰 Tech stack

  • Python (Pandas, NumPy)
  • Scikit-Learn (pipelines, metrics)
  • XGBoost / LightGBM (tree-based models)
  • Matplotlib (plots)
  • Jupyter Notebook

(See requirements.txt for package list.)


🧭 Approach (summary)

  1. Data ingestion & join by customerID.
  2. Exploratory Data Analysis (missing values, distributions, correlations).
  3. Feature engineering: tenure buckets, interaction flags, monthly charge aggregations, service counts.
  4. Class imbalance handling: class weights and sampling strategies.
  5. Temporal / stratified splitting to avoid leakage.
  6. Model training: baseline Logistic Regression β†’ RandomForest / XGBoost β†’ final XGBoost model.
  7. Evaluation: AUC-ROC, Precision@K (business threshold), confusion matrix, calibration.
  8. Explainability: feature importance and SHAP (optional).

πŸ“ˆ Key results / KPIs (final model: XGBoost)

  • AUC-ROC = 0.911 β€” exceeds the project target (β‰₯ 0.88).
  • Accuracy = 0.868 β€” consistent with a classifier that separates both classes well.
  • F1 = 0.757 β€” computed at the optimal validation threshold.
  • ROC & PR curves confirm strong ranking of predicted probabilities; as recall increases, precision decreases progressively (trade-off captured and studied for business thresholds).

Business interpretation: Using the top decile of predicted churners (Precision@10%) enables the marketing team to focus retention incentives where expected uplift is highest, maximising ROI on retention spend.


▢️ How to reproduce (local)

git clone https://github.com/<YOUR_USER>/<REPO_NAME>.git
cd <REPO_NAME>

python -m venv .venv
# mac / linux
source .venv/bin/activate
# windows
# .venv\Scripts\activate

pip install -r requirements.txt
jupyter notebook notebooks/01_Churn_Prediction.ipynb

About

πŸ“Š Predicting telecom customer churn to enable targeted retention campaigns β€” XGBoost & feature engineering.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published