A repository to help the TVET Curriculum educators to explore some practical concepts in machine learning based on how RTB curriculum structured, This repo will be updated with new Learning outcomes soon, stay checking updates.
This Jupyter notebook demonstrates key data preprocessing techniques using a small synthetic dataset. It is designed as a learning tool for Teachers / students and beginners in Machine Learning.
you are welcome to contribute to this repo if you are passionate to help others especially educators
- Handling missing data
- Feature encoding:
- Label Encoding
- Binary Encoding
- Target Encoding (with and without smoothing)
- Feature scaling (MinMax Scaler, Standard Scaler)
- Date/time feature extraction and transformation and Cycling encoder( SINUS AND COSINE representation to detect the close betwen DECEMBER(12) AND JANUARY(1))
- Correlation analysis And interpretation
- Normality testing (e.g., Kolmogorov–Smirnov test)
- Dropping irrelevant features
data cleaning with syntetic data.ipynb– Main notebook with all examples and explanations.Data.csv– Small dataset created for teaching purposes.
AS We learn by doing real project, This Bank note classification project focuses on classifying banknotes as FAKE or REAL using machine learning algorithms. It uses a dataset containing features extracted from images of banknotes, including statistical properties like variance, skewness, kurtosis, and entropy.
🧠 Models Used:
- K-Nearest Neighbor (KNN)
- Logistic Regression
- Choosing the best value of K to be used in KNN using K-fold Cross validation
- training KNN MODEL
- train Logistic Regression Model
- Evaluation Metrics:
- Accuracy
- Precision
- Recall
- F1 score
- Confusion Matrix
BANK NOTE CLASSIFICATION– Main notebook with all examples and explanations.BankNote_Authentication.xls– Small dataset used from Kaggle platform.
As we continue learning through real projects, this Flower Classification project uses a Convolutional Neural Network (CNN) to classify images into five flower types:
🧱 Custom CNN with 3 Convolutional Layers
🖼️ Image classification using CNN
🛠️ Building & training a 3-layer CNN from scratch
📊 Visualizing training accuracy and loss
📈 Evaluation Metrics:
✅ Accuracy
🎯 Precision
🔁 Recall
🧮 F1 Score
🔍 Confusion Matrix analysis
- 📁 Dataset Handling & Preprocessing split-folders: Used to automatically split the dataset (flower_images) into training and validation sets while preserving class folders.
python Copy Edit splitfolders.ratio("flower_images", output="flowers_split", seed=1337, ratio=(.8, .2)) ImageDataGenerator (tensorflow.keras.preprocessing.image) For real-time data augmentation and loading of images during training.
PIL (Python Imaging Library) Used internally by Keras for image processing (e.g., resizing, loading images).
- 🧰 Model Development TensorFlow / Keras: Main deep learning library used for defining, training, and saving the Convolutional Neural Network.
CNN Architecture: Built using Keras Sequential API with layers like:
Conv2D, MaxPooling2D, Flatten, Dense, Dropout
Model Saving:
Saved using .h5 or recommended .keras format
model.save("model_name.h5") or model.save("model_name.keras")
- 📊 Evaluation matplotlib: For visualizing training/validation accuracy and loss.
sklearn.metrics: For computing the confusion matrix and evaluating model performance on the test set.
- 🧪 Testing & Prediction numpy: For array manipulation when loading test images.
tensorflow.keras.preprocessing.image: For manually loading and preprocessing a single test image using:
python Copy Edit image.load_img(), image.img_to_array(), np.expand_dims(), etc. 5. 🌐 User Interface Gradio: Used to build a simple web interface where users can upload an image and get the predicted flower class.
Features: File Upload, Live Prediction, Easy Web UI Integration
CNN_FLOWER_CLASSIFICATION.ipynb – Main notebook with full model implementation 📥 Dataset (1000+ images per class): ⬇️ Download from Google Drive(https://drive.google.com/file/d/1ZuMforenbdcq3rLNa9itBawdZNqk-PCY/view?usp=sharing)
Make sure you have Python 3.x and pip installed. You can install Python from the official website.
git clone https://github.com/your-username/your-repo-name.git
cd your-repo-name
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pandas
numpy
scikit-learn
category_encoders
matplotlib
seaborn
scipy
Let me know if you'd like a version for publishing on **Kaggle**, **Google Colab**, or a **custom webpage** too!NIYONSHUTI Yves
Assistant Lecturer – Rwanda Polytechnic, Tumba College
Founder & CEO – Mpuza Inc.
📧 yniyonshuti@rp.ac.rw 📧 info@mpuza.com
https://mpuza.com https://www.linkedin.com/company/mpuza/?viewAsMember=true
📞 +250 786 397 515
CONTACT ME FOR ANY FURTHER EXPLANATION AND TECHNICAL SUPPORT
This repository serves as a teaching resource to help learners understand and practice data preprocessing techniques before moving on to real-world data science problems.
Feel free to use and share this notebook with attribution. The content is intended for educational purposes.