Machine-Learning-Fundamentals

A repository to help the TVET Curriculum educators to explore some practical concepts in machine learning based on how RTB curriculum structured, This repo will be updated with new Learning outcomes soon, stay checking updates.

🧼 1. Data Preprocessing Techniques with Synthetic Data

This Jupyter notebook demonstrates key data preprocessing techniques using a small synthetic dataset. It is designed as a learning tool for Teachers / students and beginners in Machine Learning.

✨ Covered Topics

you are welcome to contribute to this repo if you are passionate to help others especially educators

✨ Covered Topics

Handling missing data
Feature encoding:
- Label Encoding
- Binary Encoding
- Target Encoding (with and without smoothing)
Feature scaling (MinMax Scaler, Standard Scaler)
Date/time feature extraction and transformation and Cycling encoder( SINUS AND COSINE representation to detect the close betwen DECEMBER(12) AND JANUARY(1))
Correlation analysis And interpretation
Normality testing (e.g., Kolmogorov–Smirnov test)
Dropping irrelevant features

📁 Included Files

data cleaning with syntetic data.ipynb – Main notebook with all examples and explanations.
Data.csv – Small dataset created for teaching purposes.

🧼 2. CLASSIFICATION MODEL

AS We learn by doing real project, This Bank note classification project focuses on classifying banknotes as FAKE or REAL using machine learning algorithms. It uses a dataset containing features extracted from images of banknotes, including statistical properties like variance, skewness, kurtosis, and entropy.

🧠 Models Used:

K-Nearest Neighbor (KNN)
Logistic Regression

✨ Covered Topics

Choosing the best value of K to be used in KNN using K-fold Cross validation
training KNN MODEL
train Logistic Regression Model
Evaluation Metrics:
- Accuracy
- Precision
- Recall
- F1 score
Confusion Matrix

📁 Included Files

BANK NOTE CLASSIFICATION – Main notebook with all examples and explanations.
BankNote_Authentication.xls – Small dataset used from Kaggle platform.

🌺 3. FLOWER IMAGE CLASSIFICATION MODEL USING CNN

As we continue learning through real projects, this Flower Classification project uses a Convolutional Neural Network (CNN) to classify images into five flower types:

🌼 Lilly 🌸 Lotus 🌺 Orchid 🌻 Sunflower 🌷 Tulip

🧠 Model Used

🧱 Custom CNN with 3 Convolutional Layers

✨ Covered Topics

🖼️ Image classification using CNN

🛠️ Building & training a 3-layer CNN from scratch

📊 Visualizing training accuracy and loss

📈 Evaluation Metrics:

✅ Accuracy

🎯 Precision

🔁 Recall

🧮 F1 Score

🔍 Confusion Matrix analysis

what used in development ?

📁 Dataset Handling & Preprocessing split-folders: Used to automatically split the dataset (flower_images) into training and validation sets while preserving class folders.

python Copy Edit splitfolders.ratio("flower_images", output="flowers_split", seed=1337, ratio=(.8, .2)) ImageDataGenerator (tensorflow.keras.preprocessing.image) For real-time data augmentation and loading of images during training.

PIL (Python Imaging Library) Used internally by Keras for image processing (e.g., resizing, loading images).

🧰 Model Development TensorFlow / Keras: Main deep learning library used for defining, training, and saving the Convolutional Neural Network.

CNN Architecture: Built using Keras Sequential API with layers like:

Conv2D, MaxPooling2D, Flatten, Dense, Dropout

Model Saving:

Saved using .h5 or recommended .keras format

model.save("model_name.h5") or model.save("model_name.keras")

📊 Evaluation matplotlib: For visualizing training/validation accuracy and loss.

sklearn.metrics: For computing the confusion matrix and evaluating model performance on the test set.

🧪 Testing & Prediction numpy: For array manipulation when loading test images.

tensorflow.keras.preprocessing.image: For manually loading and preprocessing a single test image using:

python Copy Edit image.load_img(), image.img_to_array(), np.expand_dims(), etc. 5. 🌐 User Interface Gradio: Used to build a simple web interface where users can upload an image and get the predicted flower class.

Features: File Upload, Live Prediction, Easy Web UI Integration

📁 Included Files

CNN_FLOWER_CLASSIFICATION.ipynb – Main notebook with full model implementation 📥 Dataset (1000+ images per class): ⬇️ Download from Google Drive(https://drive.google.com/file/d/1ZuMforenbdcq3rLNa9itBawdZNqk-PCY/view?usp=sharing)

💻 How to Run This Notebook Locally

1️⃣ Prerequisites

Make sure you have Python 3.x and pip installed. You can install Python from the official website.

2️⃣ Clone the Repository

git clone https://github.com/your-username/your-repo-name.git
cd your-repo-name

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

pandas
numpy
scikit-learn
category_encoders
matplotlib
seaborn
scipy
Let me know if you'd like a version for publishing on **Kaggle**, **Google Colab**, or a **custom webpage** too!

🧑‍🏫 Author

NIYONSHUTI Yves
Assistant Lecturer – Rwanda Polytechnic, Tumba College
Founder & CEO – Mpuza Inc.
📧 yniyonshuti@rp.ac.rw 📧 info@mpuza.com https://mpuza.com https://www.linkedin.com/company/mpuza/?viewAsMember=true 📞 +250 786 397 515 CONTACT ME FOR ANY FURTHER EXPLANATION AND TECHNICAL SUPPORT

💡 Purpose

This repository serves as a teaching resource to help learners understand and practice data preprocessing techniques before moving on to real-world data science problems.

🔒 Note on Reuse

Feel free to use and share this notebook with attribution. The content is intended for educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitignore		.gitignore
BANK NOTES CLASSIFICATION.ipynb		BANK NOTES CLASSIFICATION.ipynb
BankNote_Authentication.xls		BankNote_Authentication.xls
CNN_FLOWER_CLASSIFICATION.ipynb		CNN_FLOWER_CLASSIFICATION.ipynb
Data.csv		Data.csv
LICENSE		LICENSE
README.md		README.md
data cleaning with syntetic data.ipynb		data cleaning with syntetic data.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine-Learning-Fundamentals

🧼 1. Data Preprocessing Techniques with Synthetic Data

✨ Covered Topics

you are welcome to contribute to this repo if you are passionate to help others especially educators

✨ Covered Topics

📁 Included Files

🧼 2. CLASSIFICATION MODEL

✨ Covered Topics

📁 Included Files

🌺 3. FLOWER IMAGE CLASSIFICATION MODEL USING CNN

🌼 Lilly 🌸 Lotus 🌺 Orchid 🌻 Sunflower 🌷 Tulip

🧠 Model Used

✨ Covered Topics

what used in development ?

📁 Included Files

💻 How to Run This Notebook Locally

1️⃣ Prerequisites

2️⃣ Clone the Repository

🧑‍🏫 Author

💡 Purpose

🔒 Note on Reuse

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Machine-Learning-Fundamentals

🧼 1. Data Preprocessing Techniques with Synthetic Data

✨ Covered Topics

you are welcome to contribute to this repo if you are passionate to help others especially educators

✨ Covered Topics

📁 Included Files

🧼 2. CLASSIFICATION MODEL

✨ Covered Topics

📁 Included Files

🌺 3. FLOWER IMAGE CLASSIFICATION MODEL USING CNN

🌼 Lilly 🌸 Lotus 🌺 Orchid 🌻 Sunflower 🌷 Tulip

🧠 Model Used

✨ Covered Topics

what used in development ?

📁 Included Files

💻 How to Run This Notebook Locally

1️⃣ Prerequisites

2️⃣ Clone the Repository

🧑‍🏫 Author

💡 Purpose

🔒 Note on Reuse

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages