A. Project Overview
This repository contains a structured, behaviour driven A/B testing analytics pipeline designed to evaluate the impact of experimental changes on key performance metrics. Rather than just running statistical tests, this project treats A/B testing as part of a repeatable decision framework, from hypothesis definition to metric validation and interpretation.
Across many analytics environments — from digital engagement to donor behaviour — A/B testing is a core method for understanding causal impact. This repository demonstrates how to run, evaluate, and interpret tests in a way that is scalable, auditable, and aligned with real decision making.
Although implementations vary across organisations, these principles apply broadly to most data analytics environments.
B. System Architecture
The A/B testing workflow follows an intentional analytic pipeline:
Raw Experiment Data (CSV/Database) ↓ Data Cleaning & Validation ↓ Metric Definition & Feature Engineering ↓ Statistical Analysis (Comparisons & Significance) ↓ Summary Outputs & Visualisations ↓ Decision Support (Insights / Recommendation)
Each stage is separate and testable, so:
errors can be isolated
assumptions are visible
analysis can be reused or extended
C. Step by Step Workflow Step 1: Data Ingestion & Cleaning
The pipeline starts by loading experimental data — often in CSV format or queried from a database. Text fields, dates, and IDs are standardised.
Example Python snippet:
import pandas as pd
df = pd.read_csv("ab_test_data.csv") df["timestamp"] = pd.to_datetime(df["timestamp"]) df = df.dropna(subset=["user_id", "group", "metric_value"])
Here, missing values are handled consciously rather than ignored, ensuring validity downstream.
Step 2: Defining Metrics & Groups
In A/B tests, metrics must reflect behavioural outcomes, not technical artifacts.
Typical definitions include:
Conversion rate
Average value per user
Engagement duration
The experiment group and control group are identified:
control = df[df["group"] == "control"] treatment = df[df["group"] == "treatment"]
This keeps analysis clear and grounded in experimental design.
Step 3: Statistical Comparison Once groups are defined, statistical tests determine whether differences are meaningful.
For example, a simple difference in means:
import numpy as np from scipy.stats import ttest_ind
t_stat, p_val = ttest_ind( treatment["metric_value"], control["metric_value"], equal_var=False )
This provides a principled basis for interpreting test outcomes.
Step 4: Organising Results
After tests, results are summarised in tabular and visual formats, including:
mean differences
confidence intervals
effect sizes
These outputs are saved and visualised for stakeholder communication.
Step 5: Decision Support
The final stage isn’t just reporting significance. It connects statistical outcomes to actionable recommendations, such as:
“Launch the new variant because metric X increased with p < 0.05”
“Further investigation required — results are not statistically reliable”
This step separates analytics from reporting and ties it to real decisions.
D. Why This Matters Reducing Manual Work
Before pipelines like this, analysts often compute metrics in spreadsheets, copy results manually, and test significance with scattershot tools. This repository eliminates that by automating:
group separation
metric calculation
statistical testing
visual summarisation
This reduces human error and speeds delivery.
Supporting Operational Decisions
A/B tests drive decisions such as:
feature rollouts
pricing changes
campaign design
user experience improvements
Well defined tests with clear analysis accelerate organisational learning and reduce risk.
Innovation Beyond Routine Tasks
This project shows how analytics workflows can be:
reproducible
transparent
decision centric
Rather than ad hoc notebooks, the code here is structured — it can be run consistently across experiments, audited, and explained.
Although implementations vary across organisations, these principles apply broadly to most data analytics environments.
E. Reflection & Learnings
Working on this A/B Testing pipeline reinforced that good analysis is shaped by good definitions.
Some key takeaways:
Clarity in hypothesis matters: A/B testing is only meaningful when the hypothesis is precise and measurable.
Data validation cannot be an afterthought: Bad data leads to meaningless conclusions.
Statistics without interpretation is noise: Test results need to connect to business or operational decisions.
Repeatability improves insight quality: Structuring the pipeline makes insights reliable over time.
From a leadership perspective, this project is less about individual scripts and more about designing a repeatable analytical capability that others can extend and trust.
For analysts looking to improve their practice, the key lesson is: design the analysis so that others — and future you — can understand, trust, and reuse it without friction.
How to Use This Repository
Clone the repository
git clone https://github.com/Kaviya Mahendran/ab_testing_project
Place your experimental dataset (e.g., ab_test_data.csv) in the root directory
Install dependencies
pip install r requirements.txt
Run the main analysis script
python run_ab_test_analysis.py
Review the structured results and visualisations
Final Note
This repository is not a one off experiment — it is a blueprint for rigorous A/B testing analytics. It reflects a pipeline mindset that enables clear, evidence based decisions, and invites extension into more advanced techniques such as:
Bayesian A/B regression
Multi armed bandit testing
Adaptive experiment design