diff --git a/.gitignore b/.gitignore
index fd1a6a21..e40cd467 100644
--- a/.gitignore
+++ b/.gitignore
@@ -170,3 +170,4 @@ todos.txt
#
experiments/
+cluster-experiments.code-workspace
diff --git a/README.md b/README.md
index f770fc7e..64b061ad 100644
--- a/README.md
+++ b/README.md
@@ -1,275 +1,284 @@
-# cluster_experiments
+# cluster-experiments
[](https://pepy.tech/project/cluster-experiments)
-[](
-https://pypi.org/project/cluster-experiments/)
+[](https://pypi.org/project/cluster-experiments/)
[](https://github.com/david26694/cluster-experiments/actions)
-[](https://app.codecov.io/gh/david26694/cluster-experiments/)
+[](https://app.codecov.io/gh/david26694/cluster-experiments/)

[](https://pypi.python.org/pypi/cluster-experiments)
-A Python library for end-to-end A/B testing workflows, featuring:
-- Experiment analysis and scorecards
-- Power analysis (simulation-based and normal approximation)
-- Variance reduction techniques (CUPED, CUPAC)
-- Support for complex experimental designs (cluster randomization, switchback experiments)
-## Key Features
+**`cluster-experiments`** is a comprehensive Python library for **end-to-end A/B testing workflows**, from experiment design to statistical analysis.
-### 1. Power Analysis
-- **Simulation-based**: Run Monte Carlo simulations to estimate power
-- **Normal approximation**: Fast power estimation using CLT
-- **Minimum Detectable Effect**: Calculate required effect sizes
-- **Multiple designs**: Support for:
- - Simple randomization
- - Variance reduction techniques in power analysis
- - Cluster randomization
- - Switchback experiments
-- **Dict config**: Easy to configure power analysis with a dictionary
-
-### 2. Experiment Analysis
-- **Analysis Plans**: Define structured analysis plans
-- **Metrics**:
- - Simple metrics
- - Ratio metrics
-- **Dimensions**: Slice results by dimensions
-- **Statistical Methods**:
- - GEE
- - Mixed Linear Models
- - Clustered / regular OLS
- - T-tests
- - Synthetic Control
-- **Dict config**: Easy to define analysis plans with a dictionary
-
-### 3. Variance Reduction
-- **CUPED** (Controlled-experiment Using Pre-Experiment Data):
- - Use historical outcome data to reduce variance, choose any granularity
- - Support for several covariates
-- **CUPAC** (Control Using Predictors as Covariates):
- - Use any scikit-learn compatible estimator to predict the outcome with pre-experiment data
-
-## Quick Start
-
-### Power Analysis Example
+## 📖 What is cluster-experiments?
-```python
-import numpy as np
-import pandas as pd
-from cluster_experiments import PowerAnalysis, NormalPowerAnalysis
+`cluster-experiments` provides a complete toolkit for designing, running, and analyzing experiments, with particular strength in handling **clustered randomization** and complex experimental designs. Originally developed to address challenges in **switchback experiments** and scenarios with **network effects** where standard randomization isn't feasible, it has evolved into a general-purpose experimentation framework supporting both simple A/B tests and other randomization designs.
-# Create sample data
-N = 1_000
-df = pd.DataFrame({
- "target": np.random.normal(0, 1, size=N),
- "date": pd.to_datetime(
- np.random.randint(
- pd.Timestamp("2024-01-01").value,
- pd.Timestamp("2024-01-31").value,
- size=N,
- )
- ),
-})
+### Why "cluster"?
-# Simulation-based power analysis with CUPED
-config = {
- "analysis": "ols",
- "perturbator": "constant",
- "splitter": "non_clustered",
- "n_simulations": 50,
-}
-pw = PowerAnalysis.from_dict(config)
-power = pw.power_analysis(df, average_effect=0.1)
-
-# Normal approximation (faster)
-npw = NormalPowerAnalysis.from_dict({
- "analysis": "ols",
- "splitter": "non_clustered",
- "n_simulations": 5,
- "time_col": "date",
-})
-power_normal = npw.power_analysis(df, average_effect=0.1)
-power_line_normal = npw.power_line(df, average_effects=[0.1, 0.2, 0.3])
+The name reflects the library's origins in handling **cluster-randomized experiments**, where randomization happens at a group level (e.g., stores, cities, time periods) rather than at the individual level. This is critical when:
+- **Spillover/Network Effects**: Treatment of one unit affects others (e.g., testing driver incentives in ride-sharing)
+- **Operational Constraints**: You can't randomize individuals (e.g., testing restaurant menu changes)
+- **Switchback Designs**: Treatment alternates over time periods within the same unit
-# MDE calculation
-mde = npw.mde(df, power=0.8)
+While the library is aimed at these scenarios, it's equally capable of handling standard A/B tests with individual-level randomization.
-# MDE line with length
-mde_timeline = npw.mde_time_line(
- df,
- powers=[0.8],
- experiment_length=[7, 14, 21]
-)
+---
+
+## Key Features
-print(power, power_line_normal, power_normal, mde, mde_timeline)
+### **Experiment Design**
+- **Power Analysis & Sample Size Calculation**
+ - Simulation-based (Monte Carlo) for any design complexity
+ - Analytical, (CLT-based) for standard designs
+ - Minimal Detectable Effect (MDE) estimation
+
+- **Multiple Experimental Designs**
+ - Standard A/B tests with individual randomization
+ - Cluster-randomized experiments
+ - Switchback/crossover experiments
+ - Stratified randomization
+ - Observational studies with Synthetic Control
+
+### **Statistical Methods**
+- **Multiple Analysis Methods**
+ - OLS and Clustered OLS regression
+ - GEE (Generalized Estimating Equations)
+ - Mixed Linear Models (MLM)
+ - Delta Method for ratio metrics
+ - Synthetic Control for observational data
+
+- **Variance Reduction Techniques**
+ - CUPED (Controlled-experiment Using Pre-Experiment Data)
+ - CUPAC (CUPED with Pre-experiment Aggregations)
+ - Covariate adjustment
+
+### **Analysis Workflow**
+- **Scorecard Generation**: Analyze multiple metrics simultaneously
+- **Multi-dimensional Slicing**: Break down results by segments
+- **Multiple Treatment Arms**: Compare several treatments at once
+- **Ratio Metrics**: Built-in support for conversion rates, averages, etc.
+
+---
+
+## 📦 Installation
+
+```bash
+pip install cluster-experiments
```
-### Experiment Analysis Example
+---
+
+## ⚡ Quick Example
+
+Here's how to run an analysis in just a few lines:
```python
-import numpy as np
import pandas as pd
-from cluster_experiments import AnalysisPlan
+import numpy as np
+from cluster_experiments import AnalysisPlan, Variant
+
+np.random.seed(42)
+# 0. Create simple data
N = 1_000
-experiment_data = pd.DataFrame({
- "order_value": np.random.normal(100, 10, size=N),
- "delivery_time": np.random.normal(10, 1, size=N),
- "experiment_group": np.random.choice(["control", "treatment"], size=N),
- "city": np.random.choice(["NYC", "LA"], size=N),
- "customer_id": np.random.randint(1, 100, size=N),
- "customer_age": np.random.randint(20, 60, size=N),
+df = pd.DataFrame({
+ "variant": np.random.choice(["control", "treatment"], N),
+ "orders": np.random.poisson(10, N),
+ "visits": np.random.poisson(100, N),
})
+df["converted"] = (df["orders"] > 0).astype(int)
-# Create analysis plan
+
+# 1. Define your analysis plan
plan = AnalysisPlan.from_metrics_dict({
"metrics": [
- {"alias": "AOV", "name": "order_value"},
- {"alias": "delivery_time", "name": "delivery_time"},
+ {"name": "orders", "alias": "revenue", "metric_type": "simple"},
+ {"name": "converted", "alias": "conversion", "metric_type": "ratio", "numerator": "converted", "denominator": "visits"}
],
"variants": [
{"name": "control", "is_control": True},
- {"name": "treatment", "is_control": False},
- ],
- "variant_col": "experiment_group",
- "alpha": 0.05,
- "dimensions": [
- {"name": "city", "values": ["NYC", "LA"]},
+ {"name": "treatment", "is_control": False}
],
- "analysis_type": "clustered_ols",
- "analysis_config": {"cluster_cols": ["customer_id"]},
+ "variant_col": "variant",
+ "analysis_type": "ols"
})
-# Run analysis
-print(plan.analyze(experiment_data).to_dataframe())
+
+# 2. Run analysis on your dataframe
+results = plan.analyze(df)
+print(results.to_dataframe().head())
+```
+
+**Output Example**:
```
+ metric_alias control_variant_name treatment_variant_name control_variant_mean treatment_variant_mean analysis_type ate ate_ci_lower ate_ci_upper p_value std_error dimension_name dimension_value alpha
+0 revenue control treatment 10.08554 9.941061 ols -1.444788e-01 -5.446603e-01 2.557026e-01 0.479186 2.041780e-01 __total_dimension total 0.05
+1 conversion control treatment 1.00000 1.000000 ols 1.110223e-16 -1.096504e-16 3.316950e-16 0.324097 1.125902e-16 __total_dimension total 0.05
+```
+
+---
-### Variance Reduction Example
+## Power Analysis
+
+Design your experiment by estimating required sample size and detectable effects. Here's a complete example using **analytical (CLT-based) power analysis**:
```python
import numpy as np
import pandas as pd
-from cluster_experiments import (
- AnalysisPlan,
- SimpleMetric,
- Variant,
- Dimension,
- TargetAggregation,
- HypothesisTest
-)
+from cluster_experiments import NormalPowerAnalysis
-N = 1000
+# Create sample historical data
+np.random.seed(42)
+N = 500
-experiment_data = pd.DataFrame({
- "order_value": np.random.normal(100, 10, size=N),
- "delivery_time": np.random.normal(10, 1, size=N),
- "experiment_group": np.random.choice(["control", "treatment"], size=N),
- "city": np.random.choice(["NYC", "LA"], size=N),
- "customer_id": np.random.randint(1, 100, size=N),
- "customer_age": np.random.randint(20, 60, size=N),
+historical_data = pd.DataFrame({
+ 'user_id': range(N),
+ 'metric': np.random.normal(100, 20, N),
+ 'date': pd.to_datetime('2025-10-01') + pd.to_timedelta(np.random.randint(0, 30, N), unit='d')
})
-pre_experiment_data = pd.DataFrame({
- "order_value": np.random.normal(100, 10, size=N),
- "customer_id": np.random.randint(1, 100, size=N),
+# Initialize analytical power analysis (fast, CLT-based)
+power_analysis = NormalPowerAnalysis.from_dict({
+ 'analysis': 'ols',
+ 'splitter': 'non_clustered',
+ 'target_col': 'metric',
+ 'time_col': 'date' # Required for mde_time_line
})
-# Define test
-cupac_model = TargetAggregation(
- agg_col="customer_id",
- target_col="order_value"
-)
+# 1. Calculate power for a given effect size
+power = power_analysis.power_analysis(historical_data, average_effect=5.0)
+print(f"Power for detecting +5 unit effect: {power:.1%}")
-hypothesis_test = HypothesisTest(
- metric=SimpleMetric(alias="AOV", name="order_value"),
- analysis_type="clustered_ols",
- analysis_config={
- "cluster_cols": ["customer_id"],
- "covariates": ["customer_age", "estimate_order_value"],
- },
- cupac_config={
- "cupac_model": cupac_model,
- "target_col": "order_value",
- },
+# 2. Calculate Minimum Detectable Effect (MDE) for desired power
+mde = power_analysis.mde(historical_data, power=0.8)
+print(f"Minimum detectable effect at 80% power: {mde:.2f}")
+
+# 3. Power curve: How power changes with effect size
+power_curve = power_analysis.power_line(
+ historical_data,
+ average_effects=[2.0, 4.0, 6.0, 8.0, 10.0]
)
+print(power_curve)
+# Tip: You can plot this using matplotlib:
+# plt.plot(power_curve['average_effect'], power_curve['power'])
-# Create analysis plan
-plan = AnalysisPlan(
- tests=[hypothesis_test],
- variants=[
- Variant("control", is_control=True),
- Variant("treatment", is_control=False),
- ],
- variant_col="experiment_group",
+# 4. MDE timeline: How MDE changes with experiment length
+mde_timeline = power_analysis.mde_time_line(
+ historical_data,
+ powers=[0.8],
+ experiment_length=[7, 14, 21, 30]
)
+```
-# Run analysis
-results = plan.analyze(experiment_data, pre_experiment_data)
-print(results.to_dataframe())
+**Output:**
+```
+Power for detecting +5 unit effect: 72.7%
+Minimum detectable effect at 80% power: 5.46
+{2.0: 0.17658708766689768, 4.0: 0.5367343456559069, 6.0: 0.8682558423423066, 8.0: 0.983992856563122, 10.0: 0.9992385426477484}
```
-## Installation
+**Key methods:**
+- `power_analysis()`: Calculate power for a given effect
+- `mde()`: Calculate minimum detectable effect
+- `power_line()`: Generate power curves across effect sizes
+- `mde_time_line()`: Calculate MDE for different experiment lengths
-You can install this package via `pip`.
+For simulation-based power analysis (for complex designs), see the [Power Analysis Guide](https://david26694.github.io/cluster-experiments/power_analysis_guide.html).
-```bash
-pip install cluster-experiments
-```
+---
+
+## 📚 Documentation
+
+For detailed guides, API references, and advanced examples, visit our [**documentation**](https://david26694.github.io/cluster-experiments/).
+
+### Core Concepts
+
+The library is built around three main components:
+
+#### 1. **Splitter** - Define how to randomize
+
+Choose how to split your data into control and treatment groups:
+
+- `NonClusteredSplitter`: Standard individual-level randomization
+- `ClusteredSplitter`: Cluster-level randomization
+- `SwitchbackSplitter`: Time-based alternating treatments
+- `StratifiedClusteredSplitter`: Balance randomization across strata
+
+#### 2. **Analysis** - Measure the impact
+
+Select the appropriate statistical method for your design:
+
+- `OLSAnalysis`: Standard regression for A/B tests
+- `ClusteredOLSAnalysis`: Clustered standard errors for cluster-randomized designs
+- `TTestClusteredAnalysis`: T-tests on cluster-aggregated data
+- `GeeExperimentAnalysis`: GEE for correlated observations
+- `SyntheticControlAnalysis`: Observational studies with synthetic controls
+
+#### 3. **AnalysisPlan** - Orchestrate your analysis
+
+Define your complete analysis workflow:
+
+- Specify metrics (simple and ratio)
+- Define variants and dimensions
+- Configure hypothesis tests
+- Generate comprehensive scorecards
+
+For **power analysis**, combine these with:
+
+- **Perturbator**: Simulate treatment effects for power calculations
+- **PowerAnalysis**: Estimate statistical power and sample sizes
+
+---
+
+## 🛠️ Advanced Features
-For detailed documentation and examples, visit our [documentation site](https://david26694.github.io/cluster-experiments/).
-
-## Features
-
-The library offers the following classes:
-
-* Regarding power analysis:
- * `PowerAnalysis`: to run power analysis on any experiment design, using simulation
- * `PowerAnalysisWithPreExperimentData`: to run power analysis on a clustered/switchback design, but adding pre-experiment df during split and perturbation (especially useful for Synthetic Control)
- * `NormalPowerAnalysis`: to run power analysis on any experiment design using the central limit theorem for the distribution of the estimator. It can be used to compute the minimum detectable effect (MDE) for a given power level.
- * `ConstantPerturbator`: to artificially perturb treated group with constant perturbations
- * `BinaryPerturbator`: to artificially perturb treated group for binary outcomes
- * `RelativePositivePerturbator`: to artificially perturb treated group with relative positive perturbations
- * `RelativeMixedPerturbator`: to artificially perturb treated group with relative perturbations for positive and negative targets
- * `NormalPerturbator`: to artificially perturb treated group with normal distribution perturbations
- * `BetaRelativePositivePerturbator`: to artificially perturb treated group with relative positive beta distribution perturbations
- * `BetaRelativePerturbator`: to artificially perturb treated group with relative beta distribution perturbations in a specified support interval
- * `SegmentedBetaRelativePerturbator`: to artificially perturb treated group with relative beta distribution perturbations in a specified support interval, but using clusters
-* Regarding splitting data:
- * `ClusteredSplitter`: to split data based on clusters
- * `FixedSizeClusteredSplitter`: to split data based on clusters with a fixed size (example: only 1 treatment cluster and the rest in control)
- * `BalancedClusteredSplitter`: to split data based on clusters in a balanced way
- * `NonClusteredSplitter`: Regular data splitting, no clusters
- * `StratifiedClusteredSplitter`: to split based on clusters and strata, balancing the number of clusters in each stratus
- * `RepeatedSampler`: for backtests where we have access to counterfactuals, does not split the data, just duplicates the data for all groups
- * Switchback splitters (the same can be done with clustered splitters, but there is a convenient way to define switchback splitters using switch frequency):
- * `SwitchbackSplitter`: to split data based on clusters and dates, for switchback experiments
- * `BalancedSwitchbackSplitter`: to split data based on clusters and dates, for switchback experiments, balancing treatment and control among all clusters
- * `StratifiedSwitchbackSplitter`: to split data based on clusters and dates, for switchback experiments, balancing the number of clusters in each stratus
- * Washover for switchback experiments:
- * `EmptyWashover`: no washover done at all.
- * `ConstantWashover`: accepts a timedelta parameter and removes the data when we switch from A to B for the timedelta interval.
-* Regarding analysis methods:
- * `GeeExperimentAnalysis`: to run GEE analysis on the results of a clustered design
- * `MLMExperimentAnalysis`: to run Mixed Linear Model analysis on the results of a clustered design
- * `TTestClusteredAnalysis`: to run a t-test on aggregated data for clusters
- * `PairedTTestClusteredAnalysis`: to run a paired t-test on aggregated data for clusters
- * `ClusteredOLSAnalysis`: to run OLS analysis on the results of a clustered design
- * `OLSAnalysis`: to run OLS analysis for non-clustered data
- * `DeltaMethodAnalysis`: to run Delta Method Analysis for clustered designs
- * `TargetAggregation`: to add pre-experimental data of the outcome to reduce variance
- * `SyntheticControlAnalysis`: to run synthetic control analysis
-* Regarding experiment analysis workflow:
- * `Metric`: abstract class to define a metric to be used in the analysis
- * `SimpleMetric`: to create a metric defined at the same level of the data used for the analysis
- * `RatioMetric`: to create a metric defined at a lower level than the data used for the analysis
- * `Variant`: to define a variant of the experiment
- * `Dimension`: to define a dimension to slice the results of the experiment
- * `HypothesisTest`: to define a Hypothesis Test with a metric, analysis method, optional analysis configuration, and optional dimensions
- * `AnalysisPlan`: to define a plan of analysis with a list of Hypothesis Tests for a dataset and the experiment variants. The `analyze()` method runs the analysis and returns the results
- * `AnalysisResults`: to store the results of an analysis
-* Other:
- * `PowerConfig`: to conveniently configure `PowerAnalysis` class
- * `ConfidenceInterval`: to store the data representation of a confidence interval
- * `InferenceResults`: to store the structure of complete statistical analysis results
+### Variance Reduction (CUPED/CUPAC)
+
+Reduce variance and detect smaller effects by leveraging pre-experiment data. Use historical metrics as covariates to control for pre-existing differences between groups.
+
+**Use cases:**
+
+- Have pre-experiment metrics for your users/clusters
+- Want to detect smaller treatment effects
+- Need more sensitive tests with same sample size
+
+See the [CUPAC Example](https://david26694.github.io/cluster-experiments/cupac_example.html) for detailed implementation.
+
+### Cluster Randomization
+
+Handle experiments where randomization occurs at group level (stores, cities, regions) rather than individual level. Essential for managing spillover effects and operational constraints.
+
+See the [Cluster Randomization Guide](https://david26694.github.io/cluster-experiments/examples/cluster_randomization.html) for details.
+
+### Switchback Experiments
+
+Design and analyze time-based crossover experiments where the same units receive both control and treatment at different times.
+
+See the [Switchback Example](https://david26694.github.io/cluster-experiments/switchback.html) for implementation.
+
+---
+
+## 🌟 Support
+
+- ⭐ Star us on [GitHub](https://github.com/david26694/cluster-experiments)
+- 📝 Read the [documentation](https://david26694.github.io/cluster-experiments/)
+- 🐛 Report issues on our [issue tracker](https://github.com/david26694/cluster-experiments/issues)
+- 💬 Join discussions in [GitHub Discussions](https://github.com/david26694/cluster-experiments/discussions)
+
+---
+
+## 📚 Citation
+
+If you use cluster-experiments in your research, please cite:
+
+```bibtex
+@software{cluster_experiments,
+ author = {David Masip and contributors},
+ title = {cluster-experiments: A Python library for designing and analyzing experiments},
+ url = {https://github.com/david26694/cluster-experiments},
+ year = {2022}
+}
+```
diff --git a/docs/examples/cluster_randomization.ipynb b/docs/examples/cluster_randomization.ipynb
new file mode 100644
index 00000000..21b71313
--- /dev/null
+++ b/docs/examples/cluster_randomization.ipynb
@@ -0,0 +1,395 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Cluster Randomization Example\n",
+ "\n",
+ "This notebook demonstrates how to analyze a **cluster-randomized experiment** where randomization occurs at the group level (e.g., stores, cities, schools) rather than at the individual level.\n",
+ "\n",
+ "## Why Cluster Randomization?\n",
+ "\n",
+ "Cluster randomization is necessary when:\n",
+ "\n",
+ "1. **Spillover Effects**: Treatment of one individual affects others (e.g., testing driver incentives in ride-sharing)\n",
+ "2. **Operational Constraints**: You can't randomize at the individual level (e.g., testing a store layout)\n",
+ "3. **Cost Efficiency**: It's cheaper to randomize groups than individuals\n",
+ "\n",
+ "## Key Consideration\n",
+ "\n",
+ "With cluster randomization, you need to account for **intra-cluster correlation** - observations within the same cluster are more similar than observations from different clusters. This requires using **clustered standard errors** or cluster-level analysis methods.\n",
+ "\n",
+ "## Setup\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "import numpy as np\n",
+ "from cluster_experiments import AnalysisPlan\n",
+ "\n",
+ "# Set random seed for reproducibility\n",
+ "np.random.seed(42)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1. Simulate Cluster-Randomized Experiment\n",
+ "\n",
+ "Let's simulate an experiment where we test a promotional campaign across different stores. Each store is randomly assigned to control or treatment, and we observe multiple transactions per store.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Total transactions: 5,055\n",
+ "Stores in control: 23\n",
+ "Stores in treatment: 27\n",
+ "\n",
+ "First few rows:\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " store_id | \n",
+ " variant | \n",
+ " purchase_amount | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | 0 | \n",
+ " 0 | \n",
+ " control | \n",
+ " 83.479541 | \n",
+ "
\n",
+ " \n",
+ " | 1 | \n",
+ " 0 | \n",
+ " control | \n",
+ " 78.039264 | \n",
+ "
\n",
+ " \n",
+ " | 2 | \n",
+ " 0 | \n",
+ " control | \n",
+ " 65.286167 | \n",
+ "
\n",
+ " \n",
+ " | 3 | \n",
+ " 0 | \n",
+ " control | \n",
+ " 63.589803 | \n",
+ "
\n",
+ " \n",
+ " | 4 | \n",
+ " 0 | \n",
+ " control | \n",
+ " 94.543677 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " store_id variant purchase_amount\n",
+ "0 0 control 83.479541\n",
+ "1 0 control 78.039264\n",
+ "2 0 control 65.286167\n",
+ "3 0 control 63.589803\n",
+ "4 0 control 94.543677"
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Define parameters\n",
+ "n_stores = 50 # Number of stores (clusters)\n",
+ "transactions_per_store = 100 # Average transactions per store\n",
+ "\n",
+ "# Step 1: Randomly assign stores to treatment\n",
+ "stores = pd.DataFrame({\n",
+ " 'store_id': range(n_stores),\n",
+ " 'variant': np.random.choice(['control', 'treatment'], n_stores),\n",
+ "})\n",
+ "\n",
+ "# Step 2: Generate transaction-level data\n",
+ "transactions = []\n",
+ "for _, store in stores.iterrows():\n",
+ " n_transactions = np.random.poisson(transactions_per_store)\n",
+ " \n",
+ " # Base purchase amount\n",
+ " base_amount = 50\n",
+ " \n",
+ " # Treatment effect: +$5 average purchase\n",
+ " treatment_effect = 5 if store['variant'] == 'treatment' else 0\n",
+ " \n",
+ " # Store-level random effect (intra-cluster correlation)\n",
+ " store_effect = np.random.normal(0, 10)\n",
+ " \n",
+ " # Generate transactions\n",
+ " store_transactions = pd.DataFrame({\n",
+ " 'store_id': store['store_id'],\n",
+ " 'variant': store['variant'],\n",
+ " 'purchase_amount': np.random.normal(\n",
+ " base_amount + treatment_effect + store_effect, \n",
+ " 20, \n",
+ " n_transactions\n",
+ " ).clip(min=0) # No negative purchases\n",
+ " })\n",
+ " \n",
+ " transactions.append(store_transactions)\n",
+ "\n",
+ "data = pd.concat(transactions, ignore_index=True)\n",
+ "\n",
+ "print(f\"Total transactions: {len(data):,}\")\n",
+ "print(f\"Stores in control: {(stores['variant'] == 'control').sum()}\")\n",
+ "print(f\"Stores in treatment: {(stores['variant'] == 'treatment').sum()}\")\n",
+ "print(f\"\\nFirst few rows:\")\n",
+ "data.head()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2. Naive Analysis (WRONG!)\n",
+ "\n",
+ "First, let's see what happens if we ignore the clustering and use standard OLS. **This is wrong** because it doesn't account for intra-cluster correlation and will give you incorrect standard errors (typically too small, leading to false positives).\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "=== Naive Analysis (Ignoring Clusters) ===\n",
+ "Treatment Effect: $4.26\n",
+ "Standard Error: $0.63\n",
+ "P-value: 0.0000\n",
+ "95% CI: [$3.03, $5.48]\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Naive analysis without clustering\n",
+ "naive_plan = AnalysisPlan.from_metrics_dict({\n",
+ " 'metrics': [\n",
+ " {\n",
+ " 'alias': 'purchase_amount',\n",
+ " 'name': 'purchase_amount',\n",
+ " 'metric_type': 'simple'\n",
+ " },\n",
+ " ],\n",
+ " 'variants': [\n",
+ " {'name': 'control', 'is_control': True},\n",
+ " {'name': 'treatment', 'is_control': False},\n",
+ " ],\n",
+ " 'variant_col': 'variant',\n",
+ " 'analysis_type': 'ols', # Standard OLS (WRONG for clustered data!)\n",
+ "})\n",
+ "\n",
+ "naive_results = naive_plan.analyze(data).to_dataframe()\n",
+ "print(\"=== Naive Analysis (Ignoring Clusters) ===\")\n",
+ "print(f\"Treatment Effect: ${naive_results.iloc[0]['ate']:.2f}\")\n",
+ "print(f\"Standard Error: ${naive_results.iloc[0]['std_error']:.2f}\")\n",
+ "print(f\"P-value: {naive_results.iloc[0]['p_value']:.4f}\")\n",
+ "print(f\"95% CI: [${naive_results.iloc[0]['ate_ci_lower']:.2f}, ${naive_results.iloc[0]['ate_ci_upper']:.2f}]\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 3. Correct Analysis with Clustered Standard Errors\n",
+ "\n",
+ "Now let's do the **correct** analysis by accounting for the clustering. We'll use `clustered_ols` which computes cluster-robust standard errors.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "=== Correct Analysis (With Clustering) ===\n",
+ "Treatment Effect: $4.26\n",
+ "Standard Error: $3.04\n",
+ "P-value: 0.1610\n",
+ "95% CI: [$-1.70, $10.21]\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Correct analysis with clustered standard errors\n",
+ "clustered_plan = AnalysisPlan.from_metrics_dict({\n",
+ " 'metrics': [\n",
+ " {\n",
+ " 'alias': 'purchase_amount',\n",
+ " 'name': 'purchase_amount',\n",
+ " 'metric_type': 'simple'\n",
+ " },\n",
+ " ],\n",
+ " 'variants': [\n",
+ " {'name': 'control', 'is_control': True},\n",
+ " {'name': 'treatment', 'is_control': False},\n",
+ " ],\n",
+ " 'variant_col': 'variant',\n",
+ " 'analysis_type': 'clustered_ols', # Clustered OLS (CORRECT!)\n",
+ " 'analysis_config': {\n",
+ " 'cluster_cols': ['store_id'] # Specify the clustering variable\n",
+ " }\n",
+ "})\n",
+ "\n",
+ "clustered_results = clustered_plan.analyze(data).to_dataframe()\n",
+ "print(\"=== Correct Analysis (With Clustering) ===\")\n",
+ "print(f\"Treatment Effect: ${clustered_results.iloc[0]['ate']:.2f}\")\n",
+ "print(f\"Standard Error: ${clustered_results.iloc[0]['std_error']:.2f}\")\n",
+ "print(f\"P-value: {clustered_results.iloc[0]['p_value']:.4f}\")\n",
+ "print(f\"95% CI: [${clustered_results.iloc[0]['ate_ci_lower']:.2f}, ${clustered_results.iloc[0]['ate_ci_upper']:.2f}]\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 4. Compare Results\n",
+ "\n",
+ "Let's compare the two approaches side by side:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "=== Comparison ===\n",
+ " Method Treatment Effect Standard Error P-value 95% CI\n",
+ " Naive (OLS) $4.26 $0.63 0.0000 [$3.03, $5.48]\n",
+ "Correct (Clustered OLS) $4.26 $3.04 0.1610 [$-1.70, $10.21]\n",
+ "\n",
+ "Notice: The clustered standard errors are LARGER, reflecting the\n",
+ "additional uncertainty from intra-cluster correlation.\n"
+ ]
+ }
+ ],
+ "source": [
+ "comparison = pd.DataFrame({\n",
+ " 'Method': ['Naive (OLS)', 'Correct (Clustered OLS)'],\n",
+ " 'Treatment Effect': [\n",
+ " f\"${naive_results.iloc[0]['ate']:.2f}\",\n",
+ " f\"${clustered_results.iloc[0]['ate']:.2f}\"\n",
+ " ],\n",
+ " 'Standard Error': [\n",
+ " f\"${naive_results.iloc[0]['std_error']:.2f}\",\n",
+ " f\"${clustered_results.iloc[0]['std_error']:.2f}\"\n",
+ " ],\n",
+ " 'P-value': [\n",
+ " f\"{naive_results.iloc[0]['p_value']:.4f}\",\n",
+ " f\"{clustered_results.iloc[0]['p_value']:.4f}\"\n",
+ " ],\n",
+ " '95% CI': [\n",
+ " f\"[${naive_results.iloc[0]['ate_ci_lower']:.2f}, ${naive_results.iloc[0]['ate_ci_upper']:.2f}]\",\n",
+ " f\"[${clustered_results.iloc[0]['ate_ci_lower']:.2f}, ${clustered_results.iloc[0]['ate_ci_upper']:.2f}]\"\n",
+ " ]\n",
+ "})\n",
+ "\n",
+ "print(\"\\n=== Comparison ===\")\n",
+ "print(comparison.to_string(index=False))\n",
+ "print(\"\\nNotice: The clustered standard errors are LARGER, reflecting the\")\n",
+ "print(\"additional uncertainty from intra-cluster correlation.\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Key Takeaways\n",
+ "\n",
+ "1. **Always account for clustering** in your analysis when randomization happens at the cluster level\n",
+ "2. **Clustered standard errors are typically larger** than naive standard errors\n",
+ "3. **Ignoring clustering leads to overstated confidence** - you might claim significance when there isn't any\n",
+ "4. **Use `clustered_ols` analysis type** and specify `cluster_cols` in the analysis config\n",
+ "\n",
+ "## When to Use Clustering\n",
+ "\n",
+ "Use clustered analysis when:\n",
+ "- ✅ Randomization is at the group level (stores, cities, schools)\n",
+ "- ✅ There are spillover effects between individuals\n",
+ "- ✅ Observations within groups are more similar than across groups\n",
+ "\n",
+ "Don't use clustering when:\n",
+ "- ❌ Randomization is truly at the individual level\n",
+ "- ❌ There's no reason to believe observations are correlated within groups\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.6"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/docs/examples/simple_ab_test.ipynb b/docs/examples/simple_ab_test.ipynb
new file mode 100644
index 00000000..61877c71
--- /dev/null
+++ b/docs/examples/simple_ab_test.ipynb
@@ -0,0 +1,515 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Simple A/B Test Example\n",
+ "\n",
+ "This notebook demonstrates a basic A/B test analysis using `cluster-experiments`.\n",
+ "\n",
+ "## Overview\n",
+ "\n",
+ "We'll simulate an experiment where we test a new feature's impact on:\n",
+ "- **Conversions** (simple metric): Whether a user made a purchase\n",
+ "- **Conversion Rate** (ratio metric): Conversions per visit\n",
+ "- **Revenue** (simple metric): Total revenue generated\n",
+ "\n",
+ "## Setup\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "import numpy as np\n",
+ "from cluster_experiments import AnalysisPlan\n",
+ "\n",
+ "# Set random seed for reproducibility\n",
+ "np.random.seed(42)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1. Generate Simulated Experiment Data\n",
+ "\n",
+ "Let's create a dataset with control and treatment groups.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Dataset shape: (2000, 5)\n",
+ "\n",
+ "First few rows:\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " user_id | \n",
+ " variant | \n",
+ " visits | \n",
+ " converted | \n",
+ " revenue | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | 0 | \n",
+ " 0 | \n",
+ " control | \n",
+ " 7 | \n",
+ " 1 | \n",
+ " 90.366149 | \n",
+ "
\n",
+ " \n",
+ " | 1 | \n",
+ " 1 | \n",
+ " treatment | \n",
+ " 14 | \n",
+ " 0 | \n",
+ " 0.000000 | \n",
+ "
\n",
+ " \n",
+ " | 2 | \n",
+ " 2 | \n",
+ " control | \n",
+ " 13 | \n",
+ " 0 | \n",
+ " 0.000000 | \n",
+ "
\n",
+ " \n",
+ " | 3 | \n",
+ " 3 | \n",
+ " control | \n",
+ " 7 | \n",
+ " 0 | \n",
+ " 0.000000 | \n",
+ "
\n",
+ " \n",
+ " | 4 | \n",
+ " 4 | \n",
+ " control | \n",
+ " 16 | \n",
+ " 0 | \n",
+ " 0.000000 | \n",
+ "
\n",
+ " \n",
+ " | 5 | \n",
+ " 5 | \n",
+ " treatment | \n",
+ " 7 | \n",
+ " 0 | \n",
+ " 0.000000 | \n",
+ "
\n",
+ " \n",
+ " | 6 | \n",
+ " 6 | \n",
+ " control | \n",
+ " 15 | \n",
+ " 0 | \n",
+ " 0.000000 | \n",
+ "
\n",
+ " \n",
+ " | 7 | \n",
+ " 7 | \n",
+ " control | \n",
+ " 12 | \n",
+ " 0 | \n",
+ " 0.000000 | \n",
+ "
\n",
+ " \n",
+ " | 8 | \n",
+ " 8 | \n",
+ " control | \n",
+ " 16 | \n",
+ " 0 | \n",
+ " 0.000000 | \n",
+ "
\n",
+ " \n",
+ " | 9 | \n",
+ " 9 | \n",
+ " treatment | \n",
+ " 8 | \n",
+ " 0 | \n",
+ " 0.000000 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " user_id variant visits converted revenue\n",
+ "0 0 control 7 1 90.366149\n",
+ "1 1 treatment 14 0 0.000000\n",
+ "2 2 control 13 0 0.000000\n",
+ "3 3 control 7 0 0.000000\n",
+ "4 4 control 16 0 0.000000\n",
+ "5 5 treatment 7 0 0.000000\n",
+ "6 6 control 15 0 0.000000\n",
+ "7 7 control 12 0 0.000000\n",
+ "8 8 control 16 0 0.000000\n",
+ "9 9 treatment 8 0 0.000000"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "n_users = 2000\n",
+ "\n",
+ "# Create base data\n",
+ "data = pd.DataFrame({\n",
+ " 'user_id': range(n_users),\n",
+ " 'variant': np.random.choice(['control', 'treatment'], n_users),\n",
+ " 'visits': np.random.poisson(10, n_users), # Number of visits\n",
+ "})\n",
+ "\n",
+ "# Simulate conversions (more likely for treatment)\n",
+ "data['converted'] = (\n",
+ " np.random.binomial(1, 0.10, n_users) | # Base conversion rate\n",
+ " (data['variant'] == 'treatment') & np.random.binomial(1, 0.03, n_users) # +3% for treatment\n",
+ ").astype(int)\n",
+ "\n",
+ "# Simulate revenue (higher for converters and treatment)\n",
+ "data['revenue'] = 0.0\n",
+ "converters = data['converted'] == 1\n",
+ "data.loc[converters, 'revenue'] = np.random.gamma(shape=2, scale=25, size=converters.sum())\n",
+ "\n",
+ "# Treatment group gets slightly higher revenue\n",
+ "treatment_converters = (data['variant'] == 'treatment') & converters\n",
+ "data.loc[treatment_converters, 'revenue'] *= 1.15\n",
+ "\n",
+ "print(f\"Dataset shape: {data.shape}\")\n",
+ "print(f\"\\nFirst few rows:\")\n",
+ "data.head(10)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2. Define Analysis Plan\n",
+ "\n",
+ "Now let's define our analysis plan with multiple metrics:\n",
+ "- **conversions**: Simple metric counting total conversions\n",
+ "- **conversion_rate**: Ratio metric (conversions / visits)\n",
+ "- **revenue**: Simple metric for total revenue\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Analysis plan created successfully!\n"
+ ]
+ }
+ ],
+ "source": [
+ "from cluster_experiments import (\n",
+ " AnalysisPlan, SimpleMetric, RatioMetric, \n",
+ " Variant, HypothesisTest\n",
+ ")\n",
+ "\n",
+ "# Define metrics by type\n",
+ "simple_metrics = {\n",
+ " \"conversions\": \"converted\", # alias: column_name\n",
+ " \"revenue\": \"revenue\"\n",
+ "}\n",
+ "\n",
+ "ratio_metrics = {\n",
+ " \"conversion_rate\": {\n",
+ " \"numerator\": \"converted\",\n",
+ " \"denominator\": \"visits\"\n",
+ " }\n",
+ "}\n",
+ "\n",
+ "# Define variants\n",
+ "variants = [\n",
+ " Variant(\"control\", is_control=True),\n",
+ " Variant(\"treatment\", is_control=False)\n",
+ "]\n",
+ "\n",
+ "# Build hypothesis tests\n",
+ "hypothesis_tests = []\n",
+ "\n",
+ "# Ratio metrics: use delta method\n",
+ "for alias, config in ratio_metrics.items():\n",
+ " metric = RatioMetric(\n",
+ " alias=alias,\n",
+ " numerator_name=config[\"numerator\"],\n",
+ " denominator_name=config[\"denominator\"]\n",
+ " )\n",
+ " hypothesis_tests.append(\n",
+ " HypothesisTest(\n",
+ " metric=metric,\n",
+ " analysis_type=\"delta\",\n",
+ " analysis_config={\n",
+ " \"scale_col\": metric.denominator_name,\n",
+ " \"cluster_cols\": [\"user_id\"]\n",
+ " }\n",
+ " )\n",
+ " )\n",
+ "\n",
+ "# Simple metrics: use OLS\n",
+ "for alias, column_name in simple_metrics.items():\n",
+ " metric = SimpleMetric(alias=alias, name=column_name)\n",
+ " hypothesis_tests.append(\n",
+ " HypothesisTest(\n",
+ " metric=metric,\n",
+ " analysis_type=\"ols\"\n",
+ " )\n",
+ " )\n",
+ "\n",
+ "# Create analysis plan\n",
+ "analysis_plan = AnalysisPlan(\n",
+ " tests=hypothesis_tests,\n",
+ " variants=variants,\n",
+ " variant_col='variant'\n",
+ ")\n",
+ "\n",
+ "print(\"Analysis plan created successfully!\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 3. Run Analysis\n",
+ "\n",
+ "Let's run the analysis and generate a comprehensive scorecard.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "=== Experiment Results ===\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/Users/luiz.henrique/Documents/GitHub/cluster-experiments/.venv/lib/python3.9/site-packages/cluster_experiments/experiment_analysis.py:1671: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.\n",
+ " return df.groupby(self.treatment_col).apply(\n",
+ "/Users/luiz.henrique/Documents/GitHub/cluster-experiments/.venv/lib/python3.9/site-packages/cluster_experiments/experiment_analysis.py:1676: UserWarning: Delta Method approximation may not be accurate for small group sizes\n",
+ " warnings.warn(\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " metric_alias | \n",
+ " control_variant_name | \n",
+ " treatment_variant_name | \n",
+ " control_variant_mean | \n",
+ " treatment_variant_mean | \n",
+ " analysis_type | \n",
+ " ate | \n",
+ " ate_ci_lower | \n",
+ " ate_ci_upper | \n",
+ " p_value | \n",
+ " std_error | \n",
+ " dimension_name | \n",
+ " dimension_value | \n",
+ " alpha | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | 0 | \n",
+ " conversion_rate | \n",
+ " control | \n",
+ " treatment | \n",
+ " 0.009972 | \n",
+ " 0.011912 | \n",
+ " delta | \n",
+ " 0.001940 | \n",
+ " -0.000825 | \n",
+ " 0.004706 | \n",
+ " 0.169006 | \n",
+ " 0.001411 | \n",
+ " __total_dimension | \n",
+ " total | \n",
+ " 0.05 | \n",
+ "
\n",
+ " \n",
+ " | 1 | \n",
+ " conversions | \n",
+ " control | \n",
+ " treatment | \n",
+ " 0.100394 | \n",
+ " 0.117886 | \n",
+ " ols | \n",
+ " 0.017492 | \n",
+ " -0.009874 | \n",
+ " 0.044859 | \n",
+ " 0.210285 | \n",
+ " 0.013963 | \n",
+ " __total_dimension | \n",
+ " total | \n",
+ " 0.05 | \n",
+ "
\n",
+ " \n",
+ " | 2 | \n",
+ " revenue | \n",
+ " control | \n",
+ " treatment | \n",
+ " 5.451515 | \n",
+ " 7.359327 | \n",
+ " ols | \n",
+ " 1.907812 | \n",
+ " -0.130488 | \n",
+ " 3.946112 | \n",
+ " 0.066581 | \n",
+ " 1.039968 | \n",
+ " __total_dimension | \n",
+ " total | \n",
+ " 0.05 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " metric_alias control_variant_name treatment_variant_name \\\n",
+ "0 conversion_rate control treatment \n",
+ "1 conversions control treatment \n",
+ "2 revenue control treatment \n",
+ "\n",
+ " control_variant_mean treatment_variant_mean analysis_type ate \\\n",
+ "0 0.009972 0.011912 delta 0.001940 \n",
+ "1 0.100394 0.117886 ols 0.017492 \n",
+ "2 5.451515 7.359327 ols 1.907812 \n",
+ "\n",
+ " ate_ci_lower ate_ci_upper p_value std_error dimension_name \\\n",
+ "0 -0.000825 0.004706 0.169006 0.001411 __total_dimension \n",
+ "1 -0.009874 0.044859 0.210285 0.013963 __total_dimension \n",
+ "2 -0.130488 3.946112 0.066581 1.039968 __total_dimension \n",
+ "\n",
+ " dimension_value alpha \n",
+ "0 total 0.05 \n",
+ "1 total 0.05 \n",
+ "2 total 0.05 "
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Run analysis\n",
+ "results = analysis_plan.analyze(data)\n",
+ "\n",
+ "# View results as a dataframe\n",
+ "results_df = results.to_dataframe()\n",
+ "print(\"\\n=== Experiment Results ===\")\n",
+ "results_df\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Summary\n",
+ "\n",
+ "This example demonstrated:\n",
+ "\n",
+ "1. ✅ **Data Simulation**: Creating realistic experiment data\n",
+ "2. ✅ **Multiple Metric Types**: Analyzing both simple and ratio metrics\n",
+ "3. ✅ **Easy Configuration**: Using dictionary-based analysis plan setup\n",
+ "4. ✅ **Comprehensive Results**: Getting treatment effects, confidence intervals, and p-values\n",
+ "\n",
+ "## Next Steps\n",
+ "\n",
+ "- Try the [CUPAC example](../cupac_example.html) to learn about variance reduction\n",
+ "- Explore [cluster randomization](cluster_randomization.html) for handling correlated units\n",
+ "- Learn about [switchback experiments](../switchback.html) for time-based designs\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.6"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/docs/license.md b/docs/license.md
new file mode 100644
index 00000000..1731b8a3
--- /dev/null
+++ b/docs/license.md
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2022 David Masip
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/docs/quick_start_power_curve.png b/docs/quick_start_power_curve.png
new file mode 100644
index 00000000..b8df592e
Binary files /dev/null and b/docs/quick_start_power_curve.png differ
diff --git a/docs/quickstart.md b/docs/quickstart.md
new file mode 100644
index 00000000..ee06e296
--- /dev/null
+++ b/docs/quickstart.md
@@ -0,0 +1,287 @@
+# Quickstart
+
+Get started with `cluster-experiments` in minutes! This guide will walk you through installation and your first experiment analysis.
+
+---
+
+## Installation
+
+Install via pip:
+
+```bash
+pip install cluster-experiments
+```
+
+!!! info "Requirements"
+ - **Python 3.8 or higher**
+ - Main dependencies: `pandas`, `numpy`, `scipy`, `statsmodels`
+
+---
+
+## 1. Your First Analysis
+
+Let's analyze a simple A/B test with multiple metrics. This is the most common use case. *See [Simple A/B Test](examples/simple_ab_test.html) for a complete walkthrough.*
+
+
+```python
+import pandas as pd
+import numpy as np
+from cluster_experiments import AnalysisPlan, Variant
+
+# 1. Set seed for reproducibility
+np.random.seed(42)
+
+# 2. Create simulated data
+N = 1_000
+df = pd.DataFrame({
+ "variant": np.random.choice(["control", "treatment"], N),
+ "orders": np.random.poisson(10, N),
+ "visits": np.random.poisson(100, N),
+})
+# Add some treatment effect to orders
+df.loc[df["variant"] == "treatment", "orders"] += np.random.poisson(1, df[df["variant"] == "treatment"].shape[0])
+
+df["converted"] = (df["orders"] > 0).astype(int)
+df["cost"] = np.random.normal(50, 10, N) # New metric: cost
+df["clicks"] = np.random.poisson(200, N) # New metric: clicks
+
+# 3. Define your analysis plan
+plan = AnalysisPlan.from_metrics_dict({
+ "metrics": [
+ {"name": "orders", "alias": "revenue", "metric_type": "simple"},
+ {"name": "converted", "alias": "conversion", "metric_type": "ratio", "numerator": "converted", "denominator": "visits"},
+ {"name": "cost", "alias": "avg_cost", "metric_type": "simple"},
+ {"name": "clicks", "alias": "ctr", "metric_type": "ratio", "numerator": "clicks", "denominator": "visits"}
+ ],
+ "variants": [
+ {"name": "control", "is_control": True},
+ {"name": "treatment", "is_control": False}
+ ],
+ "variant_col": "variant",
+ "analysis_type": "ols"
+})
+
+# 4. Run analysis on your dataframe
+results = plan.analyze(df)
+print(results.to_dataframe().head())
+```
+
+**Output:**
+```
+ metric_alias control_variant_name treatment_variant_name control_variant_mean treatment_variant_mean analysis_type ate ate_ci_lower ate_ci_upper p_value std_error dimension_name dimension_value alpha
+0 revenue control treatment 9.973469 10.994118 ols 1.020648e+00 6.140829e-01 1.427214e+00 8.640027e-07 2.074351e-01 __total_dimension total 0.05
+1 conversion control treatment 1.000000 1.000000 ols -4.163336e-16 -5.971983e-16 -2.354689e-16 6.432406e-06 9.227960e-17 __total_dimension total 0.05
+2 avg_cost control treatment 49.463206 49.547386 ols 8.417999e-02 -1.222365e+00 1.390725e+00 8.995107e-01 6.666166e-01 __total_dimension total 0.05
+3 ctr control treatment 199.795918 199.692157 ols -1.037615e-01 -1.767938e+00 1.560415e+00 9.027376e-01 8.490855e-01 __total_dimension total 0.05
+```
+
+---
+
+## 1.1. Understanding Your Results
+
+The results dataframe includes:
+
+| Column | Description |
+|--------|-------------|
+| `metric` | Name of the metric being analyzed |
+| `control_mean` | Average value in control group |
+| `treatment_mean` | Average value in treatment group |
+| `ate` | Average Treatment Effect (absolute difference) |
+| `ate_ci_lower/upper` | 95% confidence interval for ATE |
+| `p_value` | Statistical significance (< 0.05 = significant) |
+
+!!! tip "Interpreting Results"
+ - **p_value < 0.05**: Result is statistically significant (95% confidence)
+ - **Confidence interval**: If it doesn't include 0, effect is significant (95% confidence)
+
+
+---
+
+#### 1.2. Analysis Extensions: Ratio Metrics
+
+`cluster-experiments` has built-in support for ratio metrics (e.g., conversion rate, average order value), as seen in the first example:
+
+```python
+# Ratio metric: conversions / visits
+{
+ 'alias': 'conversion_rate',
+ 'metric_type': 'ratio',
+ 'numerator_name': 'converted', # Numerator column
+ 'denominator_name': 'visits' # Denominator column
+}
+```
+
+The library automatically handles the statistical complexities of ratio metrics using the Delta Method.
+
+#### 1.3. Analysis Extensions: Multi-dimensional Analysis
+
+Slice your results by dimensions (e.g., city, device type):
+
+```python
+analysis_plan = AnalysisPlan.from_metrics_dict({
+ 'metrics': [...],
+ 'variants': [...],
+ 'variant_col': 'variant',
+ 'dimensions': [
+ {'name': 'city', 'values': ['NYC', 'LA', 'Chicago']},
+ {'name': 'device', 'values': ['mobile', 'desktop']},
+ ],
+ 'analysis_type': 'ols',
+})
+```
+
+Results will include treatment effects for each dimension slice.
+
+---
+
+## 2. Power Analysis
+
+Before running an experiment, it's crucial to know how long it needs to run to detect a significant effect.
+See the [Power Analysis Guide](power_analysis_guide.html) for more complex designs (switchback, cluster randomization) and simulation methods.
+
+### 2.1. MDE
+
+Calculate the Minimum Detectable Effect (MDE) for a given sample size ($), $/alpha$ and $\beta$. parameters.
+
+```python
+import pandas as pd
+import numpy as np
+from cluster_experiments import NormalPowerAnalysis
+
+# Create sample historical data
+np.random.seed(42)
+N = 500
+historical_data = pd.DataFrame({
+ 'user_id': range(N),
+ 'metric': np.random.normal(100, 20, N),
+ 'date': pd.to_datetime('2025-10-01') + pd.to_timedelta(np.random.randint(0, 30, N), unit='d')
+})
+
+power_analysis = NormalPowerAnalysis.from_dict({
+ 'analysis': 'ols',
+ 'splitter': 'non_clustered',
+ 'target_col': 'metric',
+ 'time_col': 'date'
+})
+
+mde = power_analysis.mde(historical_data, power=0.8)
+print(f"Minimum Detectable Effect: {mde}")
+Minimum Detectable Effect: 4.935302024560818
+```
+
+### 2.2. Calculate Power
+
+Calculate the statistical power for a specific effect size you expect to see.
+
+```python
+power = power_analysis.power_analysis(historical_data, average_effect=3.5)
+print(f"Power: {power}")
+Power: 0.510914982752414
+```
+
+### 2.3. Visualize Power Curve
+
+It's helpful to visualize how power changes with effect size.
+
+```python
+import matplotlib.pyplot as plt
+
+# Calculate power for multiple effect sizes
+effect_sizes = [2.0, 4.0, 6.0, 8.0, 10.0]
+power_curve = power_analysis.power_line(
+ historical_data,
+ average_effects=effect_sizes
+)
+
+# Plotting
+plt.figure(figsize=(10, 6))
+plt.plot(power_curve['average_effect'], power_curve['power'], marker='o')
+plt.title('Power Analysis: Effect Size vs Power')
+plt.xlabel('Effect Size')
+plt.ylabel('Power')
+plt.grid(True)
+plt.show()
+```
+
+
+
+
+
+---
+## 3. Quick Reference
+
+### 3.1. Analysis Types
+
+Choose the appropriate analysis method:
+
+| Analysis Type | When to Use |
+|--------------|-------------|
+| `ols` | Standard A/B test, individual randomization |
+| `clustered_ols` | Cluster randomization (stores, cities, etc.) |
+| `gee` | Repeated measures, correlated observations |
+| `mlm` | Multi-level/hierarchical data |
+| `synthetic_control` | Observational studies, no randomization |
+
+
+### 3.2. Dictionary vs Class-Based API
+
+`cluster-experiments` offers two ways to define analysis plans, catering to different needs:
+
+#### 3.2.1. Dictionary Configuration
+
+Best for storing configurations in YAML/JSON files and automated pipelines.
+
+```python
+config = {
+ "metrics": [
+ {"name": "orders", "alias": "revenue", "metric_type": "simple"},
+ {"name": "converted", "alias": "conversion", "metric_type": "ratio", "numerator": "converted", "denominator": "visits"}
+ ],
+ "variants": [
+ {"name": "control", "is_control": True},
+ {"name": "treatment", "is_control": False}
+ ],
+ "variant_col": "variant",
+ "analysis_type": "ols"
+}
+
+plan = AnalysisPlan.from_metrics_dict(config)
+```
+
+#### 3.2.2 Class-Based API
+
+Best for exploration and custom extensions.
+
+```python
+from cluster_experiments import HypothesisTest, SimpleMetric, Variant
+
+# Explicitly define objects
+revenue_metric = SimpleMetric(name="orders", alias="revenue")
+control = Variant("control", is_control=True)
+treatment = Variant("treatment", is_control=False)
+
+plan = AnalysisPlan(
+ tests=[HypothesisTest(metric=revenue_metric, analysis_type="ols")],
+ variants=[control, treatment],
+ variant_col='variant'
+)
+```
+
+
+
+## Next Steps
+
+Now that you've completed your first analysis, explore:
+
+- 📖 **[API Reference](api/experiment_analysis.html)** - Detailed documentation for all classes
+- **[Example Gallery](cupac_example.html)** - Real-world use cases and patterns
+- **[Power Analysis Guide](power_analysis_guide.html)** - Design experiments with confidence
+- 🤝 **[Contributing](../CONTRIBUTING.md)** - Help improve the library
+
+---
+
+## Getting Help
+
+- 📝 [Documentation](https://david26694.github.io/cluster-experiments/)
+- 🐛 [Report Issues](https://github.com/david26694/cluster-experiments/issues)
+- 💬 [Discussions](https://github.com/david26694/cluster-experiments/discussions)
diff --git a/docs/stylesheets/overrides.css b/docs/stylesheets/overrides.css
new file mode 100644
index 00000000..bf676b78
--- /dev/null
+++ b/docs/stylesheets/overrides.css
@@ -0,0 +1,13 @@
+/* Custom admonition styling */
+.md-typeset .admonition {
+ border-radius: 8px;
+ border-left: 4px solid var(--md-primary-fg-color);
+ box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
+}
+
+/* Code block styling */
+.md-typeset pre {
+ border-radius: 8px;
+ background-color: var(--md-code-bg-color);
+ box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1);
+}
diff --git a/docs/stylesheets/style.css b/docs/stylesheets/style.css
new file mode 100644
index 00000000..c22863bb
--- /dev/null
+++ b/docs/stylesheets/style.css
@@ -0,0 +1,9 @@
+/* Apply text justification to all paragraphs in the documentation */
+.md-content p {
+ text-align: justify;
+}
+
+/* Optionally, justify lists or other specific elements */
+.md-content ul, .md-content ol {
+ text-align: justify;
+}
diff --git a/mkdocs.yml b/mkdocs.yml
index cdb81af7..79d40e9f 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -1,66 +1,105 @@
site_name: Cluster Experiments Docs
-extra_css: [style.css]
repo_url: https://github.com/david26694/cluster-experiments
site_url: https://david26694.github.io/cluster-experiments/
site_description: Functions to design and run clustered experiments
site_author: David Masip
use_directory_urls: false
edit_uri: blob/main/docs/
+docs_dir: docs
+site_dir: site
+
nav:
- - Home:
- - Index: index.md
- - End-to-end example: e2e_mde.ipynb
- - Cupac example: cupac_example.ipynb
- - Custom classes: create_custom_classes.ipynb
- - Switchback:
- - Stratified switchback: switchback.ipynb
- - Switchback calendar visualization: plot_calendars.ipynb
- - Visualization - 4-hour switches: plot_calendars_hours.ipynb
- - E2E switchback design example: e2e_mde_switchback.ipynb
- - Multiple treatments: multivariate.ipynb
- - AA test clustered: aa_test.ipynb
- - Paired T test: paired_ttest.ipynb
- - Different hypotheses tests: analysis_with_different_hypotheses.ipynb
- - Washover: washover_example.ipynb
- - Normal Power:
- - Compare with simulation: normal_power.ipynb
- - Time-lines: normal_power_lines.ipynb
- - Synthetic control: synthetic_control.ipynb
- - Experiment analysis workflow: experiment_analysis.ipynb
- - Delta method:
- - Delta Method Analysis: delta_method.ipynb
- - End-to-end delta method example: e2e_mde_delta.ipynb
- - API:
- - Experiment analysis methods: api/experiment_analysis.md
+ - Home: index.md
+ - Quickstart:
+ - Quickstart: quickstart.md
+ - Power Analysis Guide: normal_power_lines.ipynb
+
+ - Examples:
+ - Basic Usage:
+ - Simple A/B Test: examples/simple_ab_test.ipynb
+ - Experiment Analysis Workflow: experiment_analysis.ipynb
+ - AA Test (Clustered): aa_test.ipynb
+ - Analysis Methods:
+ - Different Hypothesis Tests: analysis_with_different_hypotheses.ipynb
+ - Paired T-Test: paired_ttest.ipynb
+ - Delta Method Analysis: delta_method.ipynb
+ - Variance Reduction:
+ - CUPAC Example: cupac_example.ipynb
+ - Cluster Experiments:
+ - Cluster Randomization: examples/cluster_randomization.ipynb
+ - Switchback Experiments:
+ - Stratified Switchback: switchback.ipynb
+ - Calendar Visualization: plot_calendars.ipynb
+ - 4-Hour Switches: plot_calendars_hours.ipynb
+ - Washover Example: washover_example.ipynb
+ - Power Analysis:
+ - Normal Power Comparison: normal_power.ipynb
+ - Power Time-Lines: normal_power_lines.ipynb
+ - Advanced Topics:
+ - Multiple Treatments: multivariate.ipynb
+ - Synthetic Control: synthetic_control.ipynb
+ - Custom Classes: create_custom_classes.ipynb
+
+ - API Reference:
+ - Experiment Analysis:
+ - Analysis Plan: api/analysis_plan.md
+ - Analysis Results: api/analysis_results.md
+ - Experiment Analysis Methods: api/experiment_analysis.md
+ - Hypothesis Test: api/hypothesis_test.md
+ - Metrics & Variants:
+ - Metric: api/metric.md
+ - Variant: api/variant.md
+ - Dimension: api/dimension.md
+ - Power Analysis:
+ - Power Analysis: api/power_analysis.md
+ - Power Config: api/power_config.md
+ - Randomization:
+ - Splitters: api/random_splitter.md
+ - Variance Reduction:
+ - CUPAC Model: api/cupac_model.md
+ - Switchback:
+ - Washover: api/washover.md
- Perturbators: api/perturbator.md
- - Splitter: api/random_splitter.md
- - Pre experiment outcome model: api/cupac_model.md
- - Power config: api/power_config.md
- - Power analysis: api/power_analysis.md
- - Washover: api/washover.md
- - Metric: api/metric.md
- - Variant: api/variant.md
- - Dimension: api/dimension.md
- - Hypothesis Test: api/hypothesis_test.md
- - Analysis Plan: api/analysis_plan.md
+
+ - Contributing: CONTRIBUTING.md
+ - License: license.md
+
+extra:
+ social:
+ - icon: fontawesome/brands/github
+ link: https://github.com/david26694/cluster-experiments
+
plugins:
- mkdocstrings:
watch:
- cluster_experiments
- mkdocs-jupyter
- search
+
+extra_css:
+ - stylesheets/overrides.css
+ - stylesheets/style.css
+
copyright: Copyright © 2022 Maintained by David Masip.
+
theme:
name: material
font:
text: Ubuntu
code: Ubuntu Mono
- feature:
- tabs: true
+ features:
+ - content.tabs
+ - content.code.annotate
+ - content.code.copy
+ - navigation.instant
+ - navigation.tracking
+ - navigation.top
palette:
primary: indigo
accent: blue
+
markdown_extensions:
+ - admonition
- codehilite
- pymdownx.inlinehilite
- pymdownx.superfences