Skip to content

Resources from a live workshop on Data Pre-Processing & Visualization for Machine Learning, covering core concepts, hands-on practicals, and interactive Q&A. Designed to help students understand how raw data is cleaned, explored, and prepared before applying ML models.

Notifications You must be signed in to change notification settings

muhammadfahd/Data-Pre-processing-Visualization

Repository files navigation

Data Pre-Processing & Visualization for Machine Learning

alt text

Live Workshop Resources — by M Fahad Bashir

This repository contains learning resources from a live hands-on workshop focused on Data Pre-Processing and Visualization, two critical steps performed before applying Machine Learning algorithms.
The session combined conceptual understanding, practical implementation, and interactive Q&A to help students work with real-world data confidently.


🎯 Workshop Overview

In this workshop, we explored how raw, unclean data is transformed into clean, meaningful data using preprocessing techniques and visualization.
Participants learned why these steps are necessary, how to apply them, and when to make the right preprocessing decisions.

Delivered LIVE on Zoom: 14 December 2025
Audience: University students & beginners in Machine Learning


📁 Repository Contents

📘 1. Slides

  • Conceptual explanation of:
    • What data is and why preprocessing is required
    • Common data issues (missing values, outliers, categorical data)
    • Feature scaling and train-test split
    • Importance of data visualization in ML
  • Beginner-friendly explanations with real-world analogies
  • Used during the live workshop session

🔗 Slides link


📓 2. Jupyter Notebook (Hands-on Practical)

  • End-to-end implementation of:
    • Loading and inspecting raw data
    • Handling missing values and duplicates
    • Encoding categorical features correctly
    • Feature scaling
    • Visualizing data using histograms, box plots, and heatmaps
  • Includes step-by-step explanations and reasoning
  • Designed for live demonstration and self-practice

Notebooks 1.Working on Unclean Smart Watch Records

2. Student Performance Record ``


3. Dataset

  • Smartwatch health dataset used during the workshop
  • Intentionally unclean to simulate real-world scenarios
  • Used to demonstrate:
    • Data quality issues
    • Visualization-driven preprocessing decisions
    • Difference between raw vs cleaned data

📁 Dataset file:
unclean_smartwatch_health_data.csv


Key Learning Outcomes

By using these resources, learners will be able to:

  • Understand why preprocessing is essential before ML
  • Identify and fix common data quality problems
  • Use visualization to guide preprocessing decisions
  • Prepare real-world data for machine learning models

🙌 Acknowledgment

Thanks to everyone who joined the live session and actively participated in the Q&A.
Your engagement made the workshop interactive and impactful!


⭐ Support

If you find this repository helpful:

  • Star the repo
  • Share it with others learning Machine Learning
  • Feel free to raise issues or suggestions

Happy Learning 🚀

About

Resources from a live workshop on Data Pre-Processing & Visualization for Machine Learning, covering core concepts, hands-on practicals, and interactive Q&A. Designed to help students understand how raw data is cleaned, explored, and prepared before applying ML models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published