Titanic Supervised Learning Analysis

Overview

This repository contains a Jupyter notebook, Assignment2_supervised_learning_flow.ipynb, which implements a complete supervised learning workflow to predict passenger survival in the Titanic disaster using the Titanic dataset. The project includes data loading, exploratory data analysis (EDA), feature engineering, model training, hyperparameter tuning, and performance evaluation using multiple machine learning models.

Dataset

The Titanic dataset is used, containing passenger information such as:

Pclass: Passenger class (1st, 2nd, or 3rd)
Sex: Gender of the passenger
Age: Age of the passenger
SibSp: Number of siblings or spouses aboard
Parch: Number of parents or children aboard
Fare: Ticket fare
Embarked: Port of embarkation
Survived: Survival status (0 = did not survive, 1 = survived)

The dataset is split into training (titanic_train.csv) and test (titanic_test.csv) sets.

Project Structure

The notebook is organized into the following parts:

Student Details: Information about the contributors.
AI Assistance: Documentation of AI tools (ChatGPT and Grok) used for guidance, including prompts for dataset explanation, code generation, and visualization.
Learning Problem: Description of the binary classification task to predict survival based on passenger features.
Initial Preparations:
- Loading the dataset using pandas.
- Exploratory Data Analysis (EDA) with summary statistics and visualizations of continuous variables (Age, Fare).
Model Training:
- Training K-Nearest Neighbors (KNN), Decision Tree, and Naive Bayes models.
- Hyperparameter tuning using GridSearchCV for KNN and Decision Tree.
Model Evaluation:
- Evaluation of models using accuracy, precision, recall, and F1 score.
- Application of the best Decision Tree model (with preprocessing pipeline) on the test set.
Performance Estimation: Comparison of predicted vs. actual survival outcomes on the test set.

Dependencies

To run the notebook, install the following Python libraries:

pandas
matplotlib
seaborn
scikit-learn

How to Run

Clone this repository.
Ensure the dataset files (titanic_train.csv and titanic_test.csv) are in the same directory as the notebook.
Open the Jupyter notebook:
Run all cells to execute the full machine learning pipeline.

Results

Best Model: Decision Tree with hyperparameters criterion='entropy', max_depth=10, min_samples_split=2, min_samples_leaf=1.
Test Set Performance:
- Accuracy: 0.782
- Precision: 0.739
- Recall: 0.557
- F1 Score: 0.636

The notebook includes a detailed comparison of model performance and a table of predicted vs. actual survival outcomes for the test set.

Contributors

Shalev Atsis

📞 Phone: +972 58-5060699
📧 Email: [email protected]
🔗 LinkedIn: Shalev Atsis

Tomer Golan

📞 Phone: +972 53-3454053
📧 Email: [email protected]
🔗 LinkedIn: Tomer Golan

Shahar Rushetzky

📞 Phone: +972 52-7729726
📧 Email: [email protected]
🔗 LinkedIn: Shahar Rushetzky

Computer Science Students, HIT College

AI Assistance

The project utilized OpenAI's ChatGPT and xAI's Grok for guidance on dataset explanation, code generation for data preprocessing, visualization, and hyperparameter tuning. Full details of the prompts used are included in the notebook.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Assignment2_supervised_learning_flow.ipynb		Assignment2_supervised_learning_flow.ipynb
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
taitanic_description.txt		taitanic_description.txt
titanic_test.csv		titanic_test.csv
titanic_train.csv		titanic_train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Titanic Supervised Learning Analysis

Overview

Dataset

Project Structure

Dependencies

How to Run

Results

Contributors

Shalev Atsis

Tomer Golan

Shahar Rushetzky

AI Assistance

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

ShalevAtsis/Machine-Learning-Flow

Folders and files

Latest commit

History

Repository files navigation

Titanic Supervised Learning Analysis

Overview

Dataset

Project Structure

Dependencies

How to Run

Results

Contributors

Shalev Atsis

Tomer Golan

Shahar Rushetzky

AI Assistance

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages