📉 Telecom Customer Churn Prediction — Interconnect

TL;DR
Binary classification project to predict customer churn for Interconnect (telecom). The goal is to identify at-risk customers so marketing can run targeted retention offers (discounts, special plans). This repo contains the analysis notebook, modelling experiments and reproducible instructions. Data is not included for privacy reasons — see data/README.md.

🔍 Problem statement

Churn prediction helps reduce voluntary customer cancellations and increase lifetime value. We frame the problem as a supervised classification task: given customer attributes, services and contract details, predict whether a customer will churn in the next billing period.

Business goal: maximize recall/precision on the top decile (Precision@K) to prioritise retention budget efficiently.

📦 Dataset (not included)

Files provided by Interconnect (example names):

contract.csv — contract length, monthly charges, payment method, contract start.
personal.csv — demographics, tenure, region.
internet.csv — internet service type (DSL/fibre), add-ons (ProteccionDeDispositivo, SeguridadEnLinea).
phone.csv — phone service usage, multiple lines.

Each file has customerID as the unique key. A small anonymized sample is available under data/sample/ for testing.

🧰 Tech stack

Python (Pandas, NumPy)
Scikit-Learn (pipelines, metrics)
XGBoost / LightGBM (tree-based models)
Matplotlib (plots)
Jupyter Notebook

(See requirements.txt for package list.)

🧭 Approach (summary)

Data ingestion & join by customerID.
Exploratory Data Analysis (missing values, distributions, correlations).
Feature engineering: tenure buckets, interaction flags, monthly charge aggregations, service counts.
Class imbalance handling: class weights and sampling strategies.
Temporal / stratified splitting to avoid leakage.
Model training: baseline Logistic Regression → RandomForest / XGBoost → final XGBoost model.
Evaluation: AUC-ROC, Precision@K (business threshold), confusion matrix, calibration.
Explainability: feature importance and SHAP (optional).

📈 Key results / KPIs (final model: XGBoost)

AUC-ROC = 0.911 — exceeds the project target (≥ 0.88).
Accuracy = 0.868 — consistent with a classifier that separates both classes well.
F1 = 0.757 — computed at the optimal validation threshold.
ROC & PR curves confirm strong ranking of predicted probabilities; as recall increases, precision decreases progressively (trade-off captured and studied for business thresholds).

Business interpretation: Using the top decile of predicted churners (Precision@10%) enables the marketing team to focus retention incentives where expected uplift is highest, maximising ROI on retention spend.

▶️ How to reproduce (local)

git clone https://github.com/<YOUR_USER>/<REPO_NAME>.git
cd <REPO_NAME>

python -m venv .venv
# mac / linux
source .venv/bin/activate
# windows
# .venv\Scripts\activate

pip install -r requirements.txt
jupyter notebook notebooks/01_Churn_Prediction.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📉 Telecom Customer Churn Prediction — Interconnect

🔍 Problem statement

📦 Dataset (not included)

🧰 Tech stack

🧭 Approach (summary)

📈 Key results / KPIs (final model: XGBoost)

▶️ How to reproduce (local)

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

SantiagoPulidoH/Telecom_Churn_Prediction

Folders and files

Latest commit

History

Repository files navigation

📉 Telecom Customer Churn Prediction — Interconnect

🔍 Problem statement

📦 Dataset (not included)

🧰 Tech stack

🧭 Approach (summary)

📈 Key results / KPIs (final model: XGBoost)

▶️ How to reproduce (local)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages