This repository contains a purely Pythonic implementation of a Decision Tree Classifier built from scratch without relying on high-level ML libraries for the core logic.
The project demonstrates a deep understanding of tree-based algorithms by manually implementing:
- Splitting Criteria: Entropy (Information Gain) and Gini Impurity.
- Tree Construction: Recursive partitioning for both categorical and numerical features.
- Prediction Logic: Traversing the learned tree structure to classify new samples.
It also includes a detailed Manual Tracing Report comparing the custom implementation against sklearn.tree.DecisionTreeClassifier.
- Custom Split Logic: Finds the optimal split by maximizing Information Gain or minimizing Gini Impurity.
- Support for Mixed Data: Handles both continuous (numerical) and categorical features automatically.
- Configurable Hyperparameters:
max_depth: Limits tree growth to prevent overfitting.min_samples_split: Controls the minimum size of a node to attempt a split.min_information_gain: Threshold for valid splits.
- Performance Metrics: Includes a custom confusion matrix evaluation function.
The implementation is based on the following concepts (detailed in docs/Manual_Calculation_Report.pdf):
- Clone the repository:
git clone [https://github.com/mariamashraf731/Decision-Tree-From-Scratch.git](https://github.com/mariamashraf731/Decision-Tree-From-Scratch.git)
- Install requirements:
pip install pandas numpy scikit-learn
- Run the script:
python src/decision_tree.py
- Python: Core logic.
- NumPy & Pandas: efficient data manipulation.
- Scikit-Learn: Used only for benchmarking and confusion matrix calculation.