This project focuses on predicting the histologic stage of liver cirrhosis based on various medical and clinical indicators of a patient. Using a dataset from a Mayo Clinic study on primary biliary cirrhosis (PBC) conducted between 1974 and 1984, machine learning models were trained to classify patients into one of three stages of liver damage:
- Stage 1 → Mild
- Stage 2 → Moderate
- Stage 3 → Severe
-
Dataset Source:
- Mayo Clinic study on Primary Biliary Cirrhosis (PBC).
- Period: 1974–1984.
-
Target Variable:
- Stage of cirrhosis (1, 2, or 3).
-
Features Used:
The dataset includes medical and biochemical attributes such as:N_Days: Duration from registration to death/transplant/study endStatus: Patient status – C (Censored), CL (Censored due to liver transplant), D (Death)Drug: Drug administered – D-penicillamine or placeboAge: Age in daysSex: M or FAscites,Hepatomegaly,Spiders: Presence of medical symptoms (Y/N)Edema: Level of edema severity (N/S/Y)- Biochemical values:
Bilirubin(mg/dl),Cholesterol(mg/dl),Albumin(gm/dl),Copper(ug/day)Alk_Phos(U/l),SGOT(U/ml),Triglycerides(mg/dl)Platelets(per 1000 ml),Prothrombintime (sec)
- Converted categorical features (e.g.,
Sex,Drug,Edema, etc.) to numerical format. - Handled missing data using appropriate imputation strategies.
- Normalized numerical features for improved model performance.
A range of classification models were evaluated using StackingClassifier with different meta (final) estimators:
-
Models Tested:
Logistic Regression→ 59.64% accuracySGD Classifier→ 58.38% accuracyLDA→ 59.40% accuracyRandom Forest→ 94.76% accuracyXGBoost→ 95.54% accuracySVM→ 84.16% accuracyKNN→ 88.42% accuracyGaussian→ 50.54% accuracy
-
Base Learners:
RandomForestClassifierXGBClassifier
-
Final Estimators Tested:
SVC→ 95.90% accuracyLogisticRegression→ 95.74% accuracySGDClassifier→ 95.64% accuracyLinearDiscriminantAnalysis→ 95.80% accuracyRandomForestClassifier→ 95.86% accuracyXGBClassifier→ 96.14% accuracy
-
Final Estimator:
XGBClassifierwas chosen as the final meta-model due to its superior performance.
-
Test 1:
- Input: Synthetic random patient data
- Output:
- Predicted Cirrhosis Stage:
1→ Stage 1 (Mild)
- Predicted Cirrhosis Stage:
- ✔️ Shows model is capable of detecting low-risk cases.
-
Test 2:
- Input: Real sample from original dataset
- Output:
- Predicted Cirrhosis Stage:
3→ Stage 3 (Severe)
- Predicted Cirrhosis Stage:
- ✔️ Confirms model can detect advanced liver damage from clinical data.
- Input patient features (age, lab results, symptoms, etc.) in the required format.
- The model will output the predicted stage of cirrhosis with a corresponding severity label.
- Can assist medical professionals in early detection and prioritization of treatment.
- Python 3.12.5