Live App: 👉 https://titanic-survival-prediction-using-machine-learning.streamlit.app/
This project interprets the Titanic dataset not merely as a table of numbers, but as a multi-dimensional structure, where observable outcomes (survival) emerge from the interplay of hidden variables — much like how M-Theory proposes that our universe is shaped by dimensions beyond direct perception.
In this analogy:
| Machine Learning Concept | M-Theory Interpretation |
|---|---|
| Features (Age, Sex, Fare…) | Dimensions of the system |
| Model Training | Stabilizing a configuration of interacting dimensions |
| Hyperparameters | Geometric constraints on the system |
| Prediction | Emergent behavior from multidimensional dynamics |
| Random Forest | Ensemble of “worlds,” each contributing to final reality |
This project is built so that even users with no coding background can explore these relationships through an interactive interface.
This repository contains:
-
Full data preprocessing pipeline
-
Exploratory analysis & visualizations
-
Categorical encoding
-
Training and tuning of four models
-
Performance comparison
-
Exporting the best model (Random Forest)
-
A full Streamlit web application that delivers:
- Individual predictions
- Batch CSV prediction
- Explanations & UI-friendly design
The project uses the Titanic dataset to predict whether a passenger survived based on seven core features.
In real life, outcomes arise from countless interacting variables. Machine learning approximates this by building a mathematical structure that captures those relationships.
Similar to M-theory:
-
What we observe (survived/not survived) is a projection from a higher-dimensional structure (Age, Fare, Pclass, Sex…).
-
The Random Forest represents many “possible worlds” (decision trees), whose combined vote shapes the final prediction.
-
GridSearchCV is the analog of exploring different configurations of the system to find the most stable or accurate one.
Thus, this project is not only a technical demonstration but a conceptual visualization of how patterns emerge from data-space.
Includes numpy, pandas, matplotlib, seaborn, scikit-learn, etc.
Explore shape, types, missing values, statistical properties.
- Drop Cabin (too sparse)
- Fill Age using mean
- Fill Embarked using mode
- Survival counts
- Gender distribution
- Class-based differences
- Combined visual plots
Sex: male→0, female→1
Embarked: S→0, C→1, Q→2
Using:
Pclass, Sex, Age, SibSp, Parch, Fare, Embarked
Only for SVM & Logistic Regression.
- Logistic Regression
- Decision Tree
- Random Forest
- SVM
All optimized using GridSearchCV (5-fold cross validation).
| Model | Training Acc | Test Acc |
|---|---|---|
| Logistic Regression | 0.8146 | 0.7821 |
| Random Forest | 0.8694 | 0.7933 |
| Decision Tree | 0.8539 | 0.7709 |
| SVM | 0.8567 | 0.7877 |
🏆 Random Forest is the best performer.
best_random_forest_model.pkl
Theory, explanations, and visual introduction.
Enter passenger details → app predicts survival.
Upload a CSV → get full annotated results.
Explains model logic, dataset, and educational purpose.
Experience the full interactive application directly in your browser.
No installation, no setup, and no coding skills required.
Explore survival predictions for individual passengers or entire CSV files.
Understand how multidimensional data shapes real-world outcomes.
Dive into the model's behavior through an intuitive, theory-inspired interface.
👉 Launch the App:
https://titanic-survival-prediction-using-machine-learning.streamlit.app/
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22 | 1 | 0 | A/5 21171 | 7.25 | S | |
| 2 | 1 | 1 | Cumings, Mrs. John Bradley | female | 38 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
| 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26 | 0 | 0 | STON/O2. 3101282 | 7.925 | S | |
| 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath | female | 35 | 1 | 0 | 113803 | 53.1 | C123 | S |
| 5 | 0 | 3 | Allen, Mr. William Henry | male | 35 | 0 | 0 | 373450 | 8.05 | S |
- Incorporating SHAP for explainability → interpret dimensions
- Extending to generative modeling → alternate “universes”
- Using boosted trees → deeper multi-branch reality surfaces
- Adding neural networks → higher-order nonlinear manifolds
This project blends Data Science, theoretical inspiration, and user-friendly interaction. It demonstrates how prediction emerges from structure — just as physical reality may arise from deeper hidden dimensions.