This repository contains the materials for the PyData London 2025 workshop: How To Measure And Mitigate Unfair Bias in Machine Learning Models.
AI tools used in hiring can unintentionally perpetuate discrimination in protected characteristics such as age, gender and ethnicity, leading to significant real-world harm. This workshop provides a practical, hands-on approach to addressing biases in machine learning models, using the example of AI-powered hiring tools.
In this workshop, we will:
- Generate a synthetic dataset of CVs for software engineers, with controlled distributions across gender and race.
- Train a biased model on this dataset to understand how machine learning systems can perpetuate discrimination.
- Evaluate fairness metrics to identify and measure bias in the model across different demographic groups.
- Apply bias mitigation techniques using the
Fairlearnlibrary to address the discovered unfairness. - Compare the trade-offs between model performance and fairness across different mitigation strategies.
By the end of the session, participants will be equipped with the knowledge and tools to tackle bias in their own projects and ensure fairer AI systems.
git clone https://github.com/john-sandall/fairness-tales-workshop
cd fairness-tales-workshopChoose your preferred package manager:
Poetry
poetry installpip
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtuv
uv venv
uv pip install -r pyproject.toml --all-extrasTo generate the synthetic CV data, you need an OpenAI API key.
cp .env.example .envThen, edit the .env file to add your API key:
OPENAI_API_KEY="sk-..."
The workshop consists of two main notebooks:
notebooks/1 - Generate CVs.ipynb: Creates a synthetic dataset of CVsnotebooks/2 - Model.ipynb: Demonstrates bias detection and mitigation techniques
To run the notebooks: jupyter lab
- pre-commit:
pre-commit run --all-files --hook-stage=manual - poetry sync:
poetry install --with dev
This project is licensed under the MIT License - see the LICENSE file for details.
