diff --git a/.gitignore b/.gitignore index fd1a6a21..e40cd467 100644 --- a/.gitignore +++ b/.gitignore @@ -170,3 +170,4 @@ todos.txt # experiments/ +cluster-experiments.code-workspace diff --git a/README.md b/README.md index f770fc7e..64b061ad 100644 --- a/README.md +++ b/README.md @@ -1,275 +1,284 @@ -# cluster_experiments +# cluster-experiments [![Downloads](https://static.pepy.tech/badge/cluster-experiments)](https://pepy.tech/project/cluster-experiments) -[![PyPI](https://img.shields.io/pypi/v/cluster-experiments)]( -https://pypi.org/project/cluster-experiments/) +[![PyPI](https://img.shields.io/pypi/v/cluster-experiments)](https://pypi.org/project/cluster-experiments/) [![Unit tests](https://github.com/david26694/cluster-experiments/workflows/Release%20unit%20Tests/badge.svg)](https://github.com/david26694/cluster-experiments/actions) -[![CodeCov]( -https://codecov.io/gh/david26694/cluster-experiments/branch/main/graph/badge.svg)](https://app.codecov.io/gh/david26694/cluster-experiments/) +[![CodeCov](https://codecov.io/gh/david26694/cluster-experiments/branch/main/graph/badge.svg)](https://app.codecov.io/gh/david26694/cluster-experiments/) ![License](https://img.shields.io/github/license/david26694/cluster-experiments) [![Pypi version](https://img.shields.io/pypi/pyversions/cluster-experiments.svg)](https://pypi.python.org/pypi/cluster-experiments) -A Python library for end-to-end A/B testing workflows, featuring: -- Experiment analysis and scorecards -- Power analysis (simulation-based and normal approximation) -- Variance reduction techniques (CUPED, CUPAC) -- Support for complex experimental designs (cluster randomization, switchback experiments) -## Key Features +**`cluster-experiments`** is a comprehensive Python library for **end-to-end A/B testing workflows**, from experiment design to statistical analysis. -### 1. Power Analysis -- **Simulation-based**: Run Monte Carlo simulations to estimate power -- **Normal approximation**: Fast power estimation using CLT -- **Minimum Detectable Effect**: Calculate required effect sizes -- **Multiple designs**: Support for: - - Simple randomization - - Variance reduction techniques in power analysis - - Cluster randomization - - Switchback experiments -- **Dict config**: Easy to configure power analysis with a dictionary - -### 2. Experiment Analysis -- **Analysis Plans**: Define structured analysis plans -- **Metrics**: - - Simple metrics - - Ratio metrics -- **Dimensions**: Slice results by dimensions -- **Statistical Methods**: - - GEE - - Mixed Linear Models - - Clustered / regular OLS - - T-tests - - Synthetic Control -- **Dict config**: Easy to define analysis plans with a dictionary - -### 3. Variance Reduction -- **CUPED** (Controlled-experiment Using Pre-Experiment Data): - - Use historical outcome data to reduce variance, choose any granularity - - Support for several covariates -- **CUPAC** (Control Using Predictors as Covariates): - - Use any scikit-learn compatible estimator to predict the outcome with pre-experiment data - -## Quick Start - -### Power Analysis Example +## 📖 What is cluster-experiments? -```python -import numpy as np -import pandas as pd -from cluster_experiments import PowerAnalysis, NormalPowerAnalysis +`cluster-experiments` provides a complete toolkit for designing, running, and analyzing experiments, with particular strength in handling **clustered randomization** and complex experimental designs. Originally developed to address challenges in **switchback experiments** and scenarios with **network effects** where standard randomization isn't feasible, it has evolved into a general-purpose experimentation framework supporting both simple A/B tests and other randomization designs. -# Create sample data -N = 1_000 -df = pd.DataFrame({ - "target": np.random.normal(0, 1, size=N), - "date": pd.to_datetime( - np.random.randint( - pd.Timestamp("2024-01-01").value, - pd.Timestamp("2024-01-31").value, - size=N, - ) - ), -}) +### Why "cluster"? -# Simulation-based power analysis with CUPED -config = { - "analysis": "ols", - "perturbator": "constant", - "splitter": "non_clustered", - "n_simulations": 50, -} -pw = PowerAnalysis.from_dict(config) -power = pw.power_analysis(df, average_effect=0.1) - -# Normal approximation (faster) -npw = NormalPowerAnalysis.from_dict({ - "analysis": "ols", - "splitter": "non_clustered", - "n_simulations": 5, - "time_col": "date", -}) -power_normal = npw.power_analysis(df, average_effect=0.1) -power_line_normal = npw.power_line(df, average_effects=[0.1, 0.2, 0.3]) +The name reflects the library's origins in handling **cluster-randomized experiments**, where randomization happens at a group level (e.g., stores, cities, time periods) rather than at the individual level. This is critical when: +- **Spillover/Network Effects**: Treatment of one unit affects others (e.g., testing driver incentives in ride-sharing) +- **Operational Constraints**: You can't randomize individuals (e.g., testing restaurant menu changes) +- **Switchback Designs**: Treatment alternates over time periods within the same unit -# MDE calculation -mde = npw.mde(df, power=0.8) +While the library is aimed at these scenarios, it's equally capable of handling standard A/B tests with individual-level randomization. -# MDE line with length -mde_timeline = npw.mde_time_line( - df, - powers=[0.8], - experiment_length=[7, 14, 21] -) +--- + +## Key Features -print(power, power_line_normal, power_normal, mde, mde_timeline) +### **Experiment Design** +- **Power Analysis & Sample Size Calculation** + - Simulation-based (Monte Carlo) for any design complexity + - Analytical, (CLT-based) for standard designs + - Minimal Detectable Effect (MDE) estimation + +- **Multiple Experimental Designs** + - Standard A/B tests with individual randomization + - Cluster-randomized experiments + - Switchback/crossover experiments + - Stratified randomization + - Observational studies with Synthetic Control + +### **Statistical Methods** +- **Multiple Analysis Methods** + - OLS and Clustered OLS regression + - GEE (Generalized Estimating Equations) + - Mixed Linear Models (MLM) + - Delta Method for ratio metrics + - Synthetic Control for observational data + +- **Variance Reduction Techniques** + - CUPED (Controlled-experiment Using Pre-Experiment Data) + - CUPAC (CUPED with Pre-experiment Aggregations) + - Covariate adjustment + +### **Analysis Workflow** +- **Scorecard Generation**: Analyze multiple metrics simultaneously +- **Multi-dimensional Slicing**: Break down results by segments +- **Multiple Treatment Arms**: Compare several treatments at once +- **Ratio Metrics**: Built-in support for conversion rates, averages, etc. + +--- + +## 📦 Installation + +```bash +pip install cluster-experiments ``` -### Experiment Analysis Example +--- + +## ⚡ Quick Example + +Here's how to run an analysis in just a few lines: ```python -import numpy as np import pandas as pd -from cluster_experiments import AnalysisPlan +import numpy as np +from cluster_experiments import AnalysisPlan, Variant + +np.random.seed(42) +# 0. Create simple data N = 1_000 -experiment_data = pd.DataFrame({ - "order_value": np.random.normal(100, 10, size=N), - "delivery_time": np.random.normal(10, 1, size=N), - "experiment_group": np.random.choice(["control", "treatment"], size=N), - "city": np.random.choice(["NYC", "LA"], size=N), - "customer_id": np.random.randint(1, 100, size=N), - "customer_age": np.random.randint(20, 60, size=N), +df = pd.DataFrame({ + "variant": np.random.choice(["control", "treatment"], N), + "orders": np.random.poisson(10, N), + "visits": np.random.poisson(100, N), }) +df["converted"] = (df["orders"] > 0).astype(int) -# Create analysis plan + +# 1. Define your analysis plan plan = AnalysisPlan.from_metrics_dict({ "metrics": [ - {"alias": "AOV", "name": "order_value"}, - {"alias": "delivery_time", "name": "delivery_time"}, + {"name": "orders", "alias": "revenue", "metric_type": "simple"}, + {"name": "converted", "alias": "conversion", "metric_type": "ratio", "numerator": "converted", "denominator": "visits"} ], "variants": [ {"name": "control", "is_control": True}, - {"name": "treatment", "is_control": False}, - ], - "variant_col": "experiment_group", - "alpha": 0.05, - "dimensions": [ - {"name": "city", "values": ["NYC", "LA"]}, + {"name": "treatment", "is_control": False} ], - "analysis_type": "clustered_ols", - "analysis_config": {"cluster_cols": ["customer_id"]}, + "variant_col": "variant", + "analysis_type": "ols" }) -# Run analysis -print(plan.analyze(experiment_data).to_dataframe()) + +# 2. Run analysis on your dataframe +results = plan.analyze(df) +print(results.to_dataframe().head()) +``` + +**Output Example**: ``` + metric_alias control_variant_name treatment_variant_name control_variant_mean treatment_variant_mean analysis_type ate ate_ci_lower ate_ci_upper p_value std_error dimension_name dimension_value alpha +0 revenue control treatment 10.08554 9.941061 ols -1.444788e-01 -5.446603e-01 2.557026e-01 0.479186 2.041780e-01 __total_dimension total 0.05 +1 conversion control treatment 1.00000 1.000000 ols 1.110223e-16 -1.096504e-16 3.316950e-16 0.324097 1.125902e-16 __total_dimension total 0.05 +``` + +--- -### Variance Reduction Example +## Power Analysis + +Design your experiment by estimating required sample size and detectable effects. Here's a complete example using **analytical (CLT-based) power analysis**: ```python import numpy as np import pandas as pd -from cluster_experiments import ( - AnalysisPlan, - SimpleMetric, - Variant, - Dimension, - TargetAggregation, - HypothesisTest -) +from cluster_experiments import NormalPowerAnalysis -N = 1000 +# Create sample historical data +np.random.seed(42) +N = 500 -experiment_data = pd.DataFrame({ - "order_value": np.random.normal(100, 10, size=N), - "delivery_time": np.random.normal(10, 1, size=N), - "experiment_group": np.random.choice(["control", "treatment"], size=N), - "city": np.random.choice(["NYC", "LA"], size=N), - "customer_id": np.random.randint(1, 100, size=N), - "customer_age": np.random.randint(20, 60, size=N), +historical_data = pd.DataFrame({ + 'user_id': range(N), + 'metric': np.random.normal(100, 20, N), + 'date': pd.to_datetime('2025-10-01') + pd.to_timedelta(np.random.randint(0, 30, N), unit='d') }) -pre_experiment_data = pd.DataFrame({ - "order_value": np.random.normal(100, 10, size=N), - "customer_id": np.random.randint(1, 100, size=N), +# Initialize analytical power analysis (fast, CLT-based) +power_analysis = NormalPowerAnalysis.from_dict({ + 'analysis': 'ols', + 'splitter': 'non_clustered', + 'target_col': 'metric', + 'time_col': 'date' # Required for mde_time_line }) -# Define test -cupac_model = TargetAggregation( - agg_col="customer_id", - target_col="order_value" -) +# 1. Calculate power for a given effect size +power = power_analysis.power_analysis(historical_data, average_effect=5.0) +print(f"Power for detecting +5 unit effect: {power:.1%}") -hypothesis_test = HypothesisTest( - metric=SimpleMetric(alias="AOV", name="order_value"), - analysis_type="clustered_ols", - analysis_config={ - "cluster_cols": ["customer_id"], - "covariates": ["customer_age", "estimate_order_value"], - }, - cupac_config={ - "cupac_model": cupac_model, - "target_col": "order_value", - }, +# 2. Calculate Minimum Detectable Effect (MDE) for desired power +mde = power_analysis.mde(historical_data, power=0.8) +print(f"Minimum detectable effect at 80% power: {mde:.2f}") + +# 3. Power curve: How power changes with effect size +power_curve = power_analysis.power_line( + historical_data, + average_effects=[2.0, 4.0, 6.0, 8.0, 10.0] ) +print(power_curve) +# Tip: You can plot this using matplotlib: +# plt.plot(power_curve['average_effect'], power_curve['power']) -# Create analysis plan -plan = AnalysisPlan( - tests=[hypothesis_test], - variants=[ - Variant("control", is_control=True), - Variant("treatment", is_control=False), - ], - variant_col="experiment_group", +# 4. MDE timeline: How MDE changes with experiment length +mde_timeline = power_analysis.mde_time_line( + historical_data, + powers=[0.8], + experiment_length=[7, 14, 21, 30] ) +``` -# Run analysis -results = plan.analyze(experiment_data, pre_experiment_data) -print(results.to_dataframe()) +**Output:** +``` +Power for detecting +5 unit effect: 72.7% +Minimum detectable effect at 80% power: 5.46 +{2.0: 0.17658708766689768, 4.0: 0.5367343456559069, 6.0: 0.8682558423423066, 8.0: 0.983992856563122, 10.0: 0.9992385426477484} ``` -## Installation +**Key methods:** +- `power_analysis()`: Calculate power for a given effect +- `mde()`: Calculate minimum detectable effect +- `power_line()`: Generate power curves across effect sizes +- `mde_time_line()`: Calculate MDE for different experiment lengths -You can install this package via `pip`. +For simulation-based power analysis (for complex designs), see the [Power Analysis Guide](https://david26694.github.io/cluster-experiments/power_analysis_guide.html). -```bash -pip install cluster-experiments -``` +--- + +## 📚 Documentation + +For detailed guides, API references, and advanced examples, visit our [**documentation**](https://david26694.github.io/cluster-experiments/). + +### Core Concepts + +The library is built around three main components: + +#### 1. **Splitter** - Define how to randomize + +Choose how to split your data into control and treatment groups: + +- `NonClusteredSplitter`: Standard individual-level randomization +- `ClusteredSplitter`: Cluster-level randomization +- `SwitchbackSplitter`: Time-based alternating treatments +- `StratifiedClusteredSplitter`: Balance randomization across strata + +#### 2. **Analysis** - Measure the impact + +Select the appropriate statistical method for your design: + +- `OLSAnalysis`: Standard regression for A/B tests +- `ClusteredOLSAnalysis`: Clustered standard errors for cluster-randomized designs +- `TTestClusteredAnalysis`: T-tests on cluster-aggregated data +- `GeeExperimentAnalysis`: GEE for correlated observations +- `SyntheticControlAnalysis`: Observational studies with synthetic controls + +#### 3. **AnalysisPlan** - Orchestrate your analysis + +Define your complete analysis workflow: + +- Specify metrics (simple and ratio) +- Define variants and dimensions +- Configure hypothesis tests +- Generate comprehensive scorecards + +For **power analysis**, combine these with: + +- **Perturbator**: Simulate treatment effects for power calculations +- **PowerAnalysis**: Estimate statistical power and sample sizes + +--- + +## 🛠️ Advanced Features -For detailed documentation and examples, visit our [documentation site](https://david26694.github.io/cluster-experiments/). - -## Features - -The library offers the following classes: - -* Regarding power analysis: - * `PowerAnalysis`: to run power analysis on any experiment design, using simulation - * `PowerAnalysisWithPreExperimentData`: to run power analysis on a clustered/switchback design, but adding pre-experiment df during split and perturbation (especially useful for Synthetic Control) - * `NormalPowerAnalysis`: to run power analysis on any experiment design using the central limit theorem for the distribution of the estimator. It can be used to compute the minimum detectable effect (MDE) for a given power level. - * `ConstantPerturbator`: to artificially perturb treated group with constant perturbations - * `BinaryPerturbator`: to artificially perturb treated group for binary outcomes - * `RelativePositivePerturbator`: to artificially perturb treated group with relative positive perturbations - * `RelativeMixedPerturbator`: to artificially perturb treated group with relative perturbations for positive and negative targets - * `NormalPerturbator`: to artificially perturb treated group with normal distribution perturbations - * `BetaRelativePositivePerturbator`: to artificially perturb treated group with relative positive beta distribution perturbations - * `BetaRelativePerturbator`: to artificially perturb treated group with relative beta distribution perturbations in a specified support interval - * `SegmentedBetaRelativePerturbator`: to artificially perturb treated group with relative beta distribution perturbations in a specified support interval, but using clusters -* Regarding splitting data: - * `ClusteredSplitter`: to split data based on clusters - * `FixedSizeClusteredSplitter`: to split data based on clusters with a fixed size (example: only 1 treatment cluster and the rest in control) - * `BalancedClusteredSplitter`: to split data based on clusters in a balanced way - * `NonClusteredSplitter`: Regular data splitting, no clusters - * `StratifiedClusteredSplitter`: to split based on clusters and strata, balancing the number of clusters in each stratus - * `RepeatedSampler`: for backtests where we have access to counterfactuals, does not split the data, just duplicates the data for all groups - * Switchback splitters (the same can be done with clustered splitters, but there is a convenient way to define switchback splitters using switch frequency): - * `SwitchbackSplitter`: to split data based on clusters and dates, for switchback experiments - * `BalancedSwitchbackSplitter`: to split data based on clusters and dates, for switchback experiments, balancing treatment and control among all clusters - * `StratifiedSwitchbackSplitter`: to split data based on clusters and dates, for switchback experiments, balancing the number of clusters in each stratus - * Washover for switchback experiments: - * `EmptyWashover`: no washover done at all. - * `ConstantWashover`: accepts a timedelta parameter and removes the data when we switch from A to B for the timedelta interval. -* Regarding analysis methods: - * `GeeExperimentAnalysis`: to run GEE analysis on the results of a clustered design - * `MLMExperimentAnalysis`: to run Mixed Linear Model analysis on the results of a clustered design - * `TTestClusteredAnalysis`: to run a t-test on aggregated data for clusters - * `PairedTTestClusteredAnalysis`: to run a paired t-test on aggregated data for clusters - * `ClusteredOLSAnalysis`: to run OLS analysis on the results of a clustered design - * `OLSAnalysis`: to run OLS analysis for non-clustered data - * `DeltaMethodAnalysis`: to run Delta Method Analysis for clustered designs - * `TargetAggregation`: to add pre-experimental data of the outcome to reduce variance - * `SyntheticControlAnalysis`: to run synthetic control analysis -* Regarding experiment analysis workflow: - * `Metric`: abstract class to define a metric to be used in the analysis - * `SimpleMetric`: to create a metric defined at the same level of the data used for the analysis - * `RatioMetric`: to create a metric defined at a lower level than the data used for the analysis - * `Variant`: to define a variant of the experiment - * `Dimension`: to define a dimension to slice the results of the experiment - * `HypothesisTest`: to define a Hypothesis Test with a metric, analysis method, optional analysis configuration, and optional dimensions - * `AnalysisPlan`: to define a plan of analysis with a list of Hypothesis Tests for a dataset and the experiment variants. The `analyze()` method runs the analysis and returns the results - * `AnalysisResults`: to store the results of an analysis -* Other: - * `PowerConfig`: to conveniently configure `PowerAnalysis` class - * `ConfidenceInterval`: to store the data representation of a confidence interval - * `InferenceResults`: to store the structure of complete statistical analysis results +### Variance Reduction (CUPED/CUPAC) + +Reduce variance and detect smaller effects by leveraging pre-experiment data. Use historical metrics as covariates to control for pre-existing differences between groups. + +**Use cases:** + +- Have pre-experiment metrics for your users/clusters +- Want to detect smaller treatment effects +- Need more sensitive tests with same sample size + +See the [CUPAC Example](https://david26694.github.io/cluster-experiments/cupac_example.html) for detailed implementation. + +### Cluster Randomization + +Handle experiments where randomization occurs at group level (stores, cities, regions) rather than individual level. Essential for managing spillover effects and operational constraints. + +See the [Cluster Randomization Guide](https://david26694.github.io/cluster-experiments/examples/cluster_randomization.html) for details. + +### Switchback Experiments + +Design and analyze time-based crossover experiments where the same units receive both control and treatment at different times. + +See the [Switchback Example](https://david26694.github.io/cluster-experiments/switchback.html) for implementation. + +--- + +## 🌟 Support + +- ⭐ Star us on [GitHub](https://github.com/david26694/cluster-experiments) +- 📝 Read the [documentation](https://david26694.github.io/cluster-experiments/) +- 🐛 Report issues on our [issue tracker](https://github.com/david26694/cluster-experiments/issues) +- 💬 Join discussions in [GitHub Discussions](https://github.com/david26694/cluster-experiments/discussions) + +--- + +## 📚 Citation + +If you use cluster-experiments in your research, please cite: + +```bibtex +@software{cluster_experiments, + author = {David Masip and contributors}, + title = {cluster-experiments: A Python library for designing and analyzing experiments}, + url = {https://github.com/david26694/cluster-experiments}, + year = {2022} +} +``` diff --git a/docs/examples/cluster_randomization.ipynb b/docs/examples/cluster_randomization.ipynb new file mode 100644 index 00000000..21b71313 --- /dev/null +++ b/docs/examples/cluster_randomization.ipynb @@ -0,0 +1,395 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Cluster Randomization Example\n", + "\n", + "This notebook demonstrates how to analyze a **cluster-randomized experiment** where randomization occurs at the group level (e.g., stores, cities, schools) rather than at the individual level.\n", + "\n", + "## Why Cluster Randomization?\n", + "\n", + "Cluster randomization is necessary when:\n", + "\n", + "1. **Spillover Effects**: Treatment of one individual affects others (e.g., testing driver incentives in ride-sharing)\n", + "2. **Operational Constraints**: You can't randomize at the individual level (e.g., testing a store layout)\n", + "3. **Cost Efficiency**: It's cheaper to randomize groups than individuals\n", + "\n", + "## Key Consideration\n", + "\n", + "With cluster randomization, you need to account for **intra-cluster correlation** - observations within the same cluster are more similar than observations from different clusters. This requires using **clustered standard errors** or cluster-level analysis methods.\n", + "\n", + "## Setup\n" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "from cluster_experiments import AnalysisPlan\n", + "\n", + "# Set random seed for reproducibility\n", + "np.random.seed(42)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Simulate Cluster-Randomized Experiment\n", + "\n", + "Let's simulate an experiment where we test a promotional campaign across different stores. Each store is randomly assigned to control or treatment, and we observe multiple transactions per store.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total transactions: 5,055\n", + "Stores in control: 23\n", + "Stores in treatment: 27\n", + "\n", + "First few rows:\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
store_idvariantpurchase_amount
00control83.479541
10control78.039264
20control65.286167
30control63.589803
40control94.543677
\n", + "
" + ], + "text/plain": [ + " store_id variant purchase_amount\n", + "0 0 control 83.479541\n", + "1 0 control 78.039264\n", + "2 0 control 65.286167\n", + "3 0 control 63.589803\n", + "4 0 control 94.543677" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Define parameters\n", + "n_stores = 50 # Number of stores (clusters)\n", + "transactions_per_store = 100 # Average transactions per store\n", + "\n", + "# Step 1: Randomly assign stores to treatment\n", + "stores = pd.DataFrame({\n", + " 'store_id': range(n_stores),\n", + " 'variant': np.random.choice(['control', 'treatment'], n_stores),\n", + "})\n", + "\n", + "# Step 2: Generate transaction-level data\n", + "transactions = []\n", + "for _, store in stores.iterrows():\n", + " n_transactions = np.random.poisson(transactions_per_store)\n", + " \n", + " # Base purchase amount\n", + " base_amount = 50\n", + " \n", + " # Treatment effect: +$5 average purchase\n", + " treatment_effect = 5 if store['variant'] == 'treatment' else 0\n", + " \n", + " # Store-level random effect (intra-cluster correlation)\n", + " store_effect = np.random.normal(0, 10)\n", + " \n", + " # Generate transactions\n", + " store_transactions = pd.DataFrame({\n", + " 'store_id': store['store_id'],\n", + " 'variant': store['variant'],\n", + " 'purchase_amount': np.random.normal(\n", + " base_amount + treatment_effect + store_effect, \n", + " 20, \n", + " n_transactions\n", + " ).clip(min=0) # No negative purchases\n", + " })\n", + " \n", + " transactions.append(store_transactions)\n", + "\n", + "data = pd.concat(transactions, ignore_index=True)\n", + "\n", + "print(f\"Total transactions: {len(data):,}\")\n", + "print(f\"Stores in control: {(stores['variant'] == 'control').sum()}\")\n", + "print(f\"Stores in treatment: {(stores['variant'] == 'treatment').sum()}\")\n", + "print(f\"\\nFirst few rows:\")\n", + "data.head()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Naive Analysis (WRONG!)\n", + "\n", + "First, let's see what happens if we ignore the clustering and use standard OLS. **This is wrong** because it doesn't account for intra-cluster correlation and will give you incorrect standard errors (typically too small, leading to false positives).\n" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "=== Naive Analysis (Ignoring Clusters) ===\n", + "Treatment Effect: $4.26\n", + "Standard Error: $0.63\n", + "P-value: 0.0000\n", + "95% CI: [$3.03, $5.48]\n" + ] + } + ], + "source": [ + "# Naive analysis without clustering\n", + "naive_plan = AnalysisPlan.from_metrics_dict({\n", + " 'metrics': [\n", + " {\n", + " 'alias': 'purchase_amount',\n", + " 'name': 'purchase_amount',\n", + " 'metric_type': 'simple'\n", + " },\n", + " ],\n", + " 'variants': [\n", + " {'name': 'control', 'is_control': True},\n", + " {'name': 'treatment', 'is_control': False},\n", + " ],\n", + " 'variant_col': 'variant',\n", + " 'analysis_type': 'ols', # Standard OLS (WRONG for clustered data!)\n", + "})\n", + "\n", + "naive_results = naive_plan.analyze(data).to_dataframe()\n", + "print(\"=== Naive Analysis (Ignoring Clusters) ===\")\n", + "print(f\"Treatment Effect: ${naive_results.iloc[0]['ate']:.2f}\")\n", + "print(f\"Standard Error: ${naive_results.iloc[0]['std_error']:.2f}\")\n", + "print(f\"P-value: {naive_results.iloc[0]['p_value']:.4f}\")\n", + "print(f\"95% CI: [${naive_results.iloc[0]['ate_ci_lower']:.2f}, ${naive_results.iloc[0]['ate_ci_upper']:.2f}]\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Correct Analysis with Clustered Standard Errors\n", + "\n", + "Now let's do the **correct** analysis by accounting for the clustering. We'll use `clustered_ols` which computes cluster-robust standard errors.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "=== Correct Analysis (With Clustering) ===\n", + "Treatment Effect: $4.26\n", + "Standard Error: $3.04\n", + "P-value: 0.1610\n", + "95% CI: [$-1.70, $10.21]\n" + ] + } + ], + "source": [ + "# Correct analysis with clustered standard errors\n", + "clustered_plan = AnalysisPlan.from_metrics_dict({\n", + " 'metrics': [\n", + " {\n", + " 'alias': 'purchase_amount',\n", + " 'name': 'purchase_amount',\n", + " 'metric_type': 'simple'\n", + " },\n", + " ],\n", + " 'variants': [\n", + " {'name': 'control', 'is_control': True},\n", + " {'name': 'treatment', 'is_control': False},\n", + " ],\n", + " 'variant_col': 'variant',\n", + " 'analysis_type': 'clustered_ols', # Clustered OLS (CORRECT!)\n", + " 'analysis_config': {\n", + " 'cluster_cols': ['store_id'] # Specify the clustering variable\n", + " }\n", + "})\n", + "\n", + "clustered_results = clustered_plan.analyze(data).to_dataframe()\n", + "print(\"=== Correct Analysis (With Clustering) ===\")\n", + "print(f\"Treatment Effect: ${clustered_results.iloc[0]['ate']:.2f}\")\n", + "print(f\"Standard Error: ${clustered_results.iloc[0]['std_error']:.2f}\")\n", + "print(f\"P-value: {clustered_results.iloc[0]['p_value']:.4f}\")\n", + "print(f\"95% CI: [${clustered_results.iloc[0]['ate_ci_lower']:.2f}, ${clustered_results.iloc[0]['ate_ci_upper']:.2f}]\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Compare Results\n", + "\n", + "Let's compare the two approaches side by side:\n" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "=== Comparison ===\n", + " Method Treatment Effect Standard Error P-value 95% CI\n", + " Naive (OLS) $4.26 $0.63 0.0000 [$3.03, $5.48]\n", + "Correct (Clustered OLS) $4.26 $3.04 0.1610 [$-1.70, $10.21]\n", + "\n", + "Notice: The clustered standard errors are LARGER, reflecting the\n", + "additional uncertainty from intra-cluster correlation.\n" + ] + } + ], + "source": [ + "comparison = pd.DataFrame({\n", + " 'Method': ['Naive (OLS)', 'Correct (Clustered OLS)'],\n", + " 'Treatment Effect': [\n", + " f\"${naive_results.iloc[0]['ate']:.2f}\",\n", + " f\"${clustered_results.iloc[0]['ate']:.2f}\"\n", + " ],\n", + " 'Standard Error': [\n", + " f\"${naive_results.iloc[0]['std_error']:.2f}\",\n", + " f\"${clustered_results.iloc[0]['std_error']:.2f}\"\n", + " ],\n", + " 'P-value': [\n", + " f\"{naive_results.iloc[0]['p_value']:.4f}\",\n", + " f\"{clustered_results.iloc[0]['p_value']:.4f}\"\n", + " ],\n", + " '95% CI': [\n", + " f\"[${naive_results.iloc[0]['ate_ci_lower']:.2f}, ${naive_results.iloc[0]['ate_ci_upper']:.2f}]\",\n", + " f\"[${clustered_results.iloc[0]['ate_ci_lower']:.2f}, ${clustered_results.iloc[0]['ate_ci_upper']:.2f}]\"\n", + " ]\n", + "})\n", + "\n", + "print(\"\\n=== Comparison ===\")\n", + "print(comparison.to_string(index=False))\n", + "print(\"\\nNotice: The clustered standard errors are LARGER, reflecting the\")\n", + "print(\"additional uncertainty from intra-cluster correlation.\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Key Takeaways\n", + "\n", + "1. **Always account for clustering** in your analysis when randomization happens at the cluster level\n", + "2. **Clustered standard errors are typically larger** than naive standard errors\n", + "3. **Ignoring clustering leads to overstated confidence** - you might claim significance when there isn't any\n", + "4. **Use `clustered_ols` analysis type** and specify `cluster_cols` in the analysis config\n", + "\n", + "## When to Use Clustering\n", + "\n", + "Use clustered analysis when:\n", + "- ✅ Randomization is at the group level (stores, cities, schools)\n", + "- ✅ There are spillover effects between individuals\n", + "- ✅ Observations within groups are more similar than across groups\n", + "\n", + "Don't use clustering when:\n", + "- ❌ Randomization is truly at the individual level\n", + "- ❌ There's no reason to believe observations are correlated within groups\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/examples/simple_ab_test.ipynb b/docs/examples/simple_ab_test.ipynb new file mode 100644 index 00000000..61877c71 --- /dev/null +++ b/docs/examples/simple_ab_test.ipynb @@ -0,0 +1,515 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Simple A/B Test Example\n", + "\n", + "This notebook demonstrates a basic A/B test analysis using `cluster-experiments`.\n", + "\n", + "## Overview\n", + "\n", + "We'll simulate an experiment where we test a new feature's impact on:\n", + "- **Conversions** (simple metric): Whether a user made a purchase\n", + "- **Conversion Rate** (ratio metric): Conversions per visit\n", + "- **Revenue** (simple metric): Total revenue generated\n", + "\n", + "## Setup\n" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "from cluster_experiments import AnalysisPlan\n", + "\n", + "# Set random seed for reproducibility\n", + "np.random.seed(42)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Generate Simulated Experiment Data\n", + "\n", + "Let's create a dataset with control and treatment groups.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Dataset shape: (2000, 5)\n", + "\n", + "First few rows:\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
user_idvariantvisitsconvertedrevenue
00control7190.366149
11treatment1400.000000
22control1300.000000
33control700.000000
44control1600.000000
55treatment700.000000
66control1500.000000
77control1200.000000
88control1600.000000
99treatment800.000000
\n", + "
" + ], + "text/plain": [ + " user_id variant visits converted revenue\n", + "0 0 control 7 1 90.366149\n", + "1 1 treatment 14 0 0.000000\n", + "2 2 control 13 0 0.000000\n", + "3 3 control 7 0 0.000000\n", + "4 4 control 16 0 0.000000\n", + "5 5 treatment 7 0 0.000000\n", + "6 6 control 15 0 0.000000\n", + "7 7 control 12 0 0.000000\n", + "8 8 control 16 0 0.000000\n", + "9 9 treatment 8 0 0.000000" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "n_users = 2000\n", + "\n", + "# Create base data\n", + "data = pd.DataFrame({\n", + " 'user_id': range(n_users),\n", + " 'variant': np.random.choice(['control', 'treatment'], n_users),\n", + " 'visits': np.random.poisson(10, n_users), # Number of visits\n", + "})\n", + "\n", + "# Simulate conversions (more likely for treatment)\n", + "data['converted'] = (\n", + " np.random.binomial(1, 0.10, n_users) | # Base conversion rate\n", + " (data['variant'] == 'treatment') & np.random.binomial(1, 0.03, n_users) # +3% for treatment\n", + ").astype(int)\n", + "\n", + "# Simulate revenue (higher for converters and treatment)\n", + "data['revenue'] = 0.0\n", + "converters = data['converted'] == 1\n", + "data.loc[converters, 'revenue'] = np.random.gamma(shape=2, scale=25, size=converters.sum())\n", + "\n", + "# Treatment group gets slightly higher revenue\n", + "treatment_converters = (data['variant'] == 'treatment') & converters\n", + "data.loc[treatment_converters, 'revenue'] *= 1.15\n", + "\n", + "print(f\"Dataset shape: {data.shape}\")\n", + "print(f\"\\nFirst few rows:\")\n", + "data.head(10)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Define Analysis Plan\n", + "\n", + "Now let's define our analysis plan with multiple metrics:\n", + "- **conversions**: Simple metric counting total conversions\n", + "- **conversion_rate**: Ratio metric (conversions / visits)\n", + "- **revenue**: Simple metric for total revenue\n" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Analysis plan created successfully!\n" + ] + } + ], + "source": [ + "from cluster_experiments import (\n", + " AnalysisPlan, SimpleMetric, RatioMetric, \n", + " Variant, HypothesisTest\n", + ")\n", + "\n", + "# Define metrics by type\n", + "simple_metrics = {\n", + " \"conversions\": \"converted\", # alias: column_name\n", + " \"revenue\": \"revenue\"\n", + "}\n", + "\n", + "ratio_metrics = {\n", + " \"conversion_rate\": {\n", + " \"numerator\": \"converted\",\n", + " \"denominator\": \"visits\"\n", + " }\n", + "}\n", + "\n", + "# Define variants\n", + "variants = [\n", + " Variant(\"control\", is_control=True),\n", + " Variant(\"treatment\", is_control=False)\n", + "]\n", + "\n", + "# Build hypothesis tests\n", + "hypothesis_tests = []\n", + "\n", + "# Ratio metrics: use delta method\n", + "for alias, config in ratio_metrics.items():\n", + " metric = RatioMetric(\n", + " alias=alias,\n", + " numerator_name=config[\"numerator\"],\n", + " denominator_name=config[\"denominator\"]\n", + " )\n", + " hypothesis_tests.append(\n", + " HypothesisTest(\n", + " metric=metric,\n", + " analysis_type=\"delta\",\n", + " analysis_config={\n", + " \"scale_col\": metric.denominator_name,\n", + " \"cluster_cols\": [\"user_id\"]\n", + " }\n", + " )\n", + " )\n", + "\n", + "# Simple metrics: use OLS\n", + "for alias, column_name in simple_metrics.items():\n", + " metric = SimpleMetric(alias=alias, name=column_name)\n", + " hypothesis_tests.append(\n", + " HypothesisTest(\n", + " metric=metric,\n", + " analysis_type=\"ols\"\n", + " )\n", + " )\n", + "\n", + "# Create analysis plan\n", + "analysis_plan = AnalysisPlan(\n", + " tests=hypothesis_tests,\n", + " variants=variants,\n", + " variant_col='variant'\n", + ")\n", + "\n", + "print(\"Analysis plan created successfully!\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Run Analysis\n", + "\n", + "Let's run the analysis and generate a comprehensive scorecard.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "=== Experiment Results ===\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/luiz.henrique/Documents/GitHub/cluster-experiments/.venv/lib/python3.9/site-packages/cluster_experiments/experiment_analysis.py:1671: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.\n", + " return df.groupby(self.treatment_col).apply(\n", + "/Users/luiz.henrique/Documents/GitHub/cluster-experiments/.venv/lib/python3.9/site-packages/cluster_experiments/experiment_analysis.py:1676: UserWarning: Delta Method approximation may not be accurate for small group sizes\n", + " warnings.warn(\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
metric_aliascontrol_variant_nametreatment_variant_namecontrol_variant_meantreatment_variant_meananalysis_typeateate_ci_lowerate_ci_upperp_valuestd_errordimension_namedimension_valuealpha
0conversion_ratecontroltreatment0.0099720.011912delta0.001940-0.0008250.0047060.1690060.001411__total_dimensiontotal0.05
1conversionscontroltreatment0.1003940.117886ols0.017492-0.0098740.0448590.2102850.013963__total_dimensiontotal0.05
2revenuecontroltreatment5.4515157.359327ols1.907812-0.1304883.9461120.0665811.039968__total_dimensiontotal0.05
\n", + "
" + ], + "text/plain": [ + " metric_alias control_variant_name treatment_variant_name \\\n", + "0 conversion_rate control treatment \n", + "1 conversions control treatment \n", + "2 revenue control treatment \n", + "\n", + " control_variant_mean treatment_variant_mean analysis_type ate \\\n", + "0 0.009972 0.011912 delta 0.001940 \n", + "1 0.100394 0.117886 ols 0.017492 \n", + "2 5.451515 7.359327 ols 1.907812 \n", + "\n", + " ate_ci_lower ate_ci_upper p_value std_error dimension_name \\\n", + "0 -0.000825 0.004706 0.169006 0.001411 __total_dimension \n", + "1 -0.009874 0.044859 0.210285 0.013963 __total_dimension \n", + "2 -0.130488 3.946112 0.066581 1.039968 __total_dimension \n", + "\n", + " dimension_value alpha \n", + "0 total 0.05 \n", + "1 total 0.05 \n", + "2 total 0.05 " + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Run analysis\n", + "results = analysis_plan.analyze(data)\n", + "\n", + "# View results as a dataframe\n", + "results_df = results.to_dataframe()\n", + "print(\"\\n=== Experiment Results ===\")\n", + "results_df\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "This example demonstrated:\n", + "\n", + "1. ✅ **Data Simulation**: Creating realistic experiment data\n", + "2. ✅ **Multiple Metric Types**: Analyzing both simple and ratio metrics\n", + "3. ✅ **Easy Configuration**: Using dictionary-based analysis plan setup\n", + "4. ✅ **Comprehensive Results**: Getting treatment effects, confidence intervals, and p-values\n", + "\n", + "## Next Steps\n", + "\n", + "- Try the [CUPAC example](../cupac_example.html) to learn about variance reduction\n", + "- Explore [cluster randomization](cluster_randomization.html) for handling correlated units\n", + "- Learn about [switchback experiments](../switchback.html) for time-based designs\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/license.md b/docs/license.md new file mode 100644 index 00000000..1731b8a3 --- /dev/null +++ b/docs/license.md @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2022 David Masip + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/docs/quick_start_power_curve.png b/docs/quick_start_power_curve.png new file mode 100644 index 00000000..b8df592e Binary files /dev/null and b/docs/quick_start_power_curve.png differ diff --git a/docs/quickstart.md b/docs/quickstart.md new file mode 100644 index 00000000..ee06e296 --- /dev/null +++ b/docs/quickstart.md @@ -0,0 +1,287 @@ +# Quickstart + +Get started with `cluster-experiments` in minutes! This guide will walk you through installation and your first experiment analysis. + +--- + +## Installation + +Install via pip: + +```bash +pip install cluster-experiments +``` + +!!! info "Requirements" + - **Python 3.8 or higher** + - Main dependencies: `pandas`, `numpy`, `scipy`, `statsmodels` + +--- + +## 1. Your First Analysis + +Let's analyze a simple A/B test with multiple metrics. This is the most common use case. *See [Simple A/B Test](examples/simple_ab_test.html) for a complete walkthrough.* + + +```python +import pandas as pd +import numpy as np +from cluster_experiments import AnalysisPlan, Variant + +# 1. Set seed for reproducibility +np.random.seed(42) + +# 2. Create simulated data +N = 1_000 +df = pd.DataFrame({ + "variant": np.random.choice(["control", "treatment"], N), + "orders": np.random.poisson(10, N), + "visits": np.random.poisson(100, N), +}) +# Add some treatment effect to orders +df.loc[df["variant"] == "treatment", "orders"] += np.random.poisson(1, df[df["variant"] == "treatment"].shape[0]) + +df["converted"] = (df["orders"] > 0).astype(int) +df["cost"] = np.random.normal(50, 10, N) # New metric: cost +df["clicks"] = np.random.poisson(200, N) # New metric: clicks + +# 3. Define your analysis plan +plan = AnalysisPlan.from_metrics_dict({ + "metrics": [ + {"name": "orders", "alias": "revenue", "metric_type": "simple"}, + {"name": "converted", "alias": "conversion", "metric_type": "ratio", "numerator": "converted", "denominator": "visits"}, + {"name": "cost", "alias": "avg_cost", "metric_type": "simple"}, + {"name": "clicks", "alias": "ctr", "metric_type": "ratio", "numerator": "clicks", "denominator": "visits"} + ], + "variants": [ + {"name": "control", "is_control": True}, + {"name": "treatment", "is_control": False} + ], + "variant_col": "variant", + "analysis_type": "ols" +}) + +# 4. Run analysis on your dataframe +results = plan.analyze(df) +print(results.to_dataframe().head()) +``` + +**Output:** +``` + metric_alias control_variant_name treatment_variant_name control_variant_mean treatment_variant_mean analysis_type ate ate_ci_lower ate_ci_upper p_value std_error dimension_name dimension_value alpha +0 revenue control treatment 9.973469 10.994118 ols 1.020648e+00 6.140829e-01 1.427214e+00 8.640027e-07 2.074351e-01 __total_dimension total 0.05 +1 conversion control treatment 1.000000 1.000000 ols -4.163336e-16 -5.971983e-16 -2.354689e-16 6.432406e-06 9.227960e-17 __total_dimension total 0.05 +2 avg_cost control treatment 49.463206 49.547386 ols 8.417999e-02 -1.222365e+00 1.390725e+00 8.995107e-01 6.666166e-01 __total_dimension total 0.05 +3 ctr control treatment 199.795918 199.692157 ols -1.037615e-01 -1.767938e+00 1.560415e+00 9.027376e-01 8.490855e-01 __total_dimension total 0.05 +``` + +--- + +## 1.1. Understanding Your Results + +The results dataframe includes: + +| Column | Description | +|--------|-------------| +| `metric` | Name of the metric being analyzed | +| `control_mean` | Average value in control group | +| `treatment_mean` | Average value in treatment group | +| `ate` | Average Treatment Effect (absolute difference) | +| `ate_ci_lower/upper` | 95% confidence interval for ATE | +| `p_value` | Statistical significance (< 0.05 = significant) | + +!!! tip "Interpreting Results" + - **p_value < 0.05**: Result is statistically significant (95% confidence) + - **Confidence interval**: If it doesn't include 0, effect is significant (95% confidence) + + +--- + +#### 1.2. Analysis Extensions: Ratio Metrics + +`cluster-experiments` has built-in support for ratio metrics (e.g., conversion rate, average order value), as seen in the first example: + +```python +# Ratio metric: conversions / visits +{ + 'alias': 'conversion_rate', + 'metric_type': 'ratio', + 'numerator_name': 'converted', # Numerator column + 'denominator_name': 'visits' # Denominator column +} +``` + +The library automatically handles the statistical complexities of ratio metrics using the Delta Method. + +#### 1.3. Analysis Extensions: Multi-dimensional Analysis + +Slice your results by dimensions (e.g., city, device type): + +```python +analysis_plan = AnalysisPlan.from_metrics_dict({ + 'metrics': [...], + 'variants': [...], + 'variant_col': 'variant', + 'dimensions': [ + {'name': 'city', 'values': ['NYC', 'LA', 'Chicago']}, + {'name': 'device', 'values': ['mobile', 'desktop']}, + ], + 'analysis_type': 'ols', +}) +``` + +Results will include treatment effects for each dimension slice. + +--- + +## 2. Power Analysis + +Before running an experiment, it's crucial to know how long it needs to run to detect a significant effect. +See the [Power Analysis Guide](power_analysis_guide.html) for more complex designs (switchback, cluster randomization) and simulation methods. + +### 2.1. MDE + +Calculate the Minimum Detectable Effect (MDE) for a given sample size ($), $/alpha$ and $\beta$. parameters. + +```python +import pandas as pd +import numpy as np +from cluster_experiments import NormalPowerAnalysis + +# Create sample historical data +np.random.seed(42) +N = 500 +historical_data = pd.DataFrame({ + 'user_id': range(N), + 'metric': np.random.normal(100, 20, N), + 'date': pd.to_datetime('2025-10-01') + pd.to_timedelta(np.random.randint(0, 30, N), unit='d') +}) + +power_analysis = NormalPowerAnalysis.from_dict({ + 'analysis': 'ols', + 'splitter': 'non_clustered', + 'target_col': 'metric', + 'time_col': 'date' +}) + +mde = power_analysis.mde(historical_data, power=0.8) +print(f"Minimum Detectable Effect: {mde}") +Minimum Detectable Effect: 4.935302024560818 +``` + +### 2.2. Calculate Power + +Calculate the statistical power for a specific effect size you expect to see. + +```python +power = power_analysis.power_analysis(historical_data, average_effect=3.5) +print(f"Power: {power}") +Power: 0.510914982752414 +``` + +### 2.3. Visualize Power Curve + +It's helpful to visualize how power changes with effect size. + +```python +import matplotlib.pyplot as plt + +# Calculate power for multiple effect sizes +effect_sizes = [2.0, 4.0, 6.0, 8.0, 10.0] +power_curve = power_analysis.power_line( + historical_data, + average_effects=effect_sizes +) + +# Plotting +plt.figure(figsize=(10, 6)) +plt.plot(power_curve['average_effect'], power_curve['power'], marker='o') +plt.title('Power Analysis: Effect Size vs Power') +plt.xlabel('Effect Size') +plt.ylabel('Power') +plt.grid(True) +plt.show() +``` + +![Power Analysis Curve](quick_start_power_curve.png) + + + +--- +## 3. Quick Reference + +### 3.1. Analysis Types + +Choose the appropriate analysis method: + +| Analysis Type | When to Use | +|--------------|-------------| +| `ols` | Standard A/B test, individual randomization | +| `clustered_ols` | Cluster randomization (stores, cities, etc.) | +| `gee` | Repeated measures, correlated observations | +| `mlm` | Multi-level/hierarchical data | +| `synthetic_control` | Observational studies, no randomization | + + +### 3.2. Dictionary vs Class-Based API + +`cluster-experiments` offers two ways to define analysis plans, catering to different needs: + +#### 3.2.1. Dictionary Configuration + +Best for storing configurations in YAML/JSON files and automated pipelines. + +```python +config = { + "metrics": [ + {"name": "orders", "alias": "revenue", "metric_type": "simple"}, + {"name": "converted", "alias": "conversion", "metric_type": "ratio", "numerator": "converted", "denominator": "visits"} + ], + "variants": [ + {"name": "control", "is_control": True}, + {"name": "treatment", "is_control": False} + ], + "variant_col": "variant", + "analysis_type": "ols" +} + +plan = AnalysisPlan.from_metrics_dict(config) +``` + +#### 3.2.2 Class-Based API + +Best for exploration and custom extensions. + +```python +from cluster_experiments import HypothesisTest, SimpleMetric, Variant + +# Explicitly define objects +revenue_metric = SimpleMetric(name="orders", alias="revenue") +control = Variant("control", is_control=True) +treatment = Variant("treatment", is_control=False) + +plan = AnalysisPlan( + tests=[HypothesisTest(metric=revenue_metric, analysis_type="ols")], + variants=[control, treatment], + variant_col='variant' +) +``` + + + +## Next Steps + +Now that you've completed your first analysis, explore: + +- 📖 **[API Reference](api/experiment_analysis.html)** - Detailed documentation for all classes +- **[Example Gallery](cupac_example.html)** - Real-world use cases and patterns +- **[Power Analysis Guide](power_analysis_guide.html)** - Design experiments with confidence +- 🤝 **[Contributing](../CONTRIBUTING.md)** - Help improve the library + +--- + +## Getting Help + +- 📝 [Documentation](https://david26694.github.io/cluster-experiments/) +- 🐛 [Report Issues](https://github.com/david26694/cluster-experiments/issues) +- 💬 [Discussions](https://github.com/david26694/cluster-experiments/discussions) diff --git a/docs/stylesheets/overrides.css b/docs/stylesheets/overrides.css new file mode 100644 index 00000000..bf676b78 --- /dev/null +++ b/docs/stylesheets/overrides.css @@ -0,0 +1,13 @@ +/* Custom admonition styling */ +.md-typeset .admonition { + border-radius: 8px; + border-left: 4px solid var(--md-primary-fg-color); + box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1); +} + +/* Code block styling */ +.md-typeset pre { + border-radius: 8px; + background-color: var(--md-code-bg-color); + box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1); +} diff --git a/docs/stylesheets/style.css b/docs/stylesheets/style.css new file mode 100644 index 00000000..c22863bb --- /dev/null +++ b/docs/stylesheets/style.css @@ -0,0 +1,9 @@ +/* Apply text justification to all paragraphs in the documentation */ +.md-content p { + text-align: justify; +} + +/* Optionally, justify lists or other specific elements */ +.md-content ul, .md-content ol { + text-align: justify; +} diff --git a/mkdocs.yml b/mkdocs.yml index cdb81af7..79d40e9f 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,66 +1,105 @@ site_name: Cluster Experiments Docs -extra_css: [style.css] repo_url: https://github.com/david26694/cluster-experiments site_url: https://david26694.github.io/cluster-experiments/ site_description: Functions to design and run clustered experiments site_author: David Masip use_directory_urls: false edit_uri: blob/main/docs/ +docs_dir: docs +site_dir: site + nav: - - Home: - - Index: index.md - - End-to-end example: e2e_mde.ipynb - - Cupac example: cupac_example.ipynb - - Custom classes: create_custom_classes.ipynb - - Switchback: - - Stratified switchback: switchback.ipynb - - Switchback calendar visualization: plot_calendars.ipynb - - Visualization - 4-hour switches: plot_calendars_hours.ipynb - - E2E switchback design example: e2e_mde_switchback.ipynb - - Multiple treatments: multivariate.ipynb - - AA test clustered: aa_test.ipynb - - Paired T test: paired_ttest.ipynb - - Different hypotheses tests: analysis_with_different_hypotheses.ipynb - - Washover: washover_example.ipynb - - Normal Power: - - Compare with simulation: normal_power.ipynb - - Time-lines: normal_power_lines.ipynb - - Synthetic control: synthetic_control.ipynb - - Experiment analysis workflow: experiment_analysis.ipynb - - Delta method: - - Delta Method Analysis: delta_method.ipynb - - End-to-end delta method example: e2e_mde_delta.ipynb - - API: - - Experiment analysis methods: api/experiment_analysis.md + - Home: index.md + - Quickstart: + - Quickstart: quickstart.md + - Power Analysis Guide: normal_power_lines.ipynb + + - Examples: + - Basic Usage: + - Simple A/B Test: examples/simple_ab_test.ipynb + - Experiment Analysis Workflow: experiment_analysis.ipynb + - AA Test (Clustered): aa_test.ipynb + - Analysis Methods: + - Different Hypothesis Tests: analysis_with_different_hypotheses.ipynb + - Paired T-Test: paired_ttest.ipynb + - Delta Method Analysis: delta_method.ipynb + - Variance Reduction: + - CUPAC Example: cupac_example.ipynb + - Cluster Experiments: + - Cluster Randomization: examples/cluster_randomization.ipynb + - Switchback Experiments: + - Stratified Switchback: switchback.ipynb + - Calendar Visualization: plot_calendars.ipynb + - 4-Hour Switches: plot_calendars_hours.ipynb + - Washover Example: washover_example.ipynb + - Power Analysis: + - Normal Power Comparison: normal_power.ipynb + - Power Time-Lines: normal_power_lines.ipynb + - Advanced Topics: + - Multiple Treatments: multivariate.ipynb + - Synthetic Control: synthetic_control.ipynb + - Custom Classes: create_custom_classes.ipynb + + - API Reference: + - Experiment Analysis: + - Analysis Plan: api/analysis_plan.md + - Analysis Results: api/analysis_results.md + - Experiment Analysis Methods: api/experiment_analysis.md + - Hypothesis Test: api/hypothesis_test.md + - Metrics & Variants: + - Metric: api/metric.md + - Variant: api/variant.md + - Dimension: api/dimension.md + - Power Analysis: + - Power Analysis: api/power_analysis.md + - Power Config: api/power_config.md + - Randomization: + - Splitters: api/random_splitter.md + - Variance Reduction: + - CUPAC Model: api/cupac_model.md + - Switchback: + - Washover: api/washover.md - Perturbators: api/perturbator.md - - Splitter: api/random_splitter.md - - Pre experiment outcome model: api/cupac_model.md - - Power config: api/power_config.md - - Power analysis: api/power_analysis.md - - Washover: api/washover.md - - Metric: api/metric.md - - Variant: api/variant.md - - Dimension: api/dimension.md - - Hypothesis Test: api/hypothesis_test.md - - Analysis Plan: api/analysis_plan.md + + - Contributing: CONTRIBUTING.md + - License: license.md + +extra: + social: + - icon: fontawesome/brands/github + link: https://github.com/david26694/cluster-experiments + plugins: - mkdocstrings: watch: - cluster_experiments - mkdocs-jupyter - search + +extra_css: + - stylesheets/overrides.css + - stylesheets/style.css + copyright: Copyright © 2022 Maintained by David Masip. + theme: name: material font: text: Ubuntu code: Ubuntu Mono - feature: - tabs: true + features: + - content.tabs + - content.code.annotate + - content.code.copy + - navigation.instant + - navigation.tracking + - navigation.top palette: primary: indigo accent: blue + markdown_extensions: + - admonition - codehilite - pymdownx.inlinehilite - pymdownx.superfences