From 5c6558c1e50c6dd4554f43c9ff8c0b4f9e6aa573 Mon Sep 17 00:00:00 2001 From: luizhsuperti <62964489+luizhsuperti@users.noreply.github.com> Date: Sat, 8 Feb 2025 17:03:05 +0100 Subject: [PATCH 1/9] Update .gitignore --- .gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/.gitignore b/.gitignore index fd1a6a21..e40cd467 100644 --- a/.gitignore +++ b/.gitignore @@ -170,3 +170,4 @@ todos.txt # experiments/ +cluster-experiments.code-workspace From b6babccdf976ae1f353e64dd6a4f12ef52285e7e Mon Sep 17 00:00:00 2001 From: luizhsuperti Date: Tue, 25 Feb 2025 20:25:57 +0100 Subject: [PATCH 2/9] Revamp documentation --- README.md | 281 ++++----------------------------- docs/quickstart.md | 134 ++++++++++++++++ docs/stylesheets/overrides.css | 13 ++ docs/stylesheets/style.css | 9 ++ mkdocs.yml | 74 ++++++--- 5 files changed, 237 insertions(+), 274 deletions(-) create mode 100644 docs/quickstart.md create mode 100644 docs/stylesheets/overrides.css create mode 100644 docs/stylesheets/style.css diff --git a/README.md b/README.md index 6b35f64b..30ae29c7 100644 --- a/README.md +++ b/README.md @@ -11,265 +11,46 @@ https://codecov.io/gh/david26694/cluster-experiments/branch/main/graph/badge.svg ![License](https://img.shields.io/github/license/david26694/cluster-experiments) [![Pypi version](https://img.shields.io/pypi/pyversions/cluster-experiments.svg)](https://pypi.python.org/pypi/cluster-experiments) -A Python library for end-to-end A/B testing workflows, featuring: -- Experiment analysis and scorecards -- Power analysis (simulation-based and normal approximation) -- Variance reduction techniques (CUPED, CUPAC) -- Support for complex experimental designs (cluster randomization, switchback experiments) +**`cluster experiments`** is a comprehensive Python library for end-to-end A/B testing workflows, designed for seamless integration with Pandas in production environments. -## Key Features +--- -### 1. Power Analysis -- **Simulation-based**: Run Monte Carlo simulations to estimate power -- **Normal approximation**: Fast power estimation using CLT -- **Minimum Detectable Effect**: Calculate required effect sizes -- **Multiple designs**: Support for: - - Simple randomization - - Variance reduction techniques in power analysis - - Cluster randomization - - Switchback experiments -- **Dict config**: Easy to configure power analysis with a dictionary +## πŸš€ Key Features -### 2. Experiment Analysis -- **Analysis Plans**: Define structured analysis plans -- **Metrics**: - - Simple metrics - - Ratio metrics -- **Dimensions**: Slice results by dimensions -- **Statistical Methods**: - - GEE - - Mixed Linear Models - - Clustered / regular OLS - - T-tests - - Synthetic Control -- **Dict config**: Easy to define analysis plans with a dictionary +### πŸ“Œ **Experiment Design & Planning** +- **Power analysis** and **Minimal Detectable Effect (MDE)** estimation +- Support for **complex experimental designs**, including: + - 🏒 **Cluster randomization** + - πŸ”„ **Switchback experiments** -### 3. Variance Reduction -- **CUPED** (Controlled-experiment Using Pre-Experiment Data): - - Use historical outcome data to reduce variance, choose any granularity - - Support for several covariates -- **CUPAC** (Control Using Predictors as Covariates): - - Use any scikit-learn compatible estimator to predict the outcome with pre-experiment data +### πŸ›  **Data Preprocessing** +- Tools for **efficient data preparation** +- Seamlessly integrates with **Pandas** for streamlined workflows -## Quick Start +### πŸ“Š **Comprehensive Experiment Analysis** +##### **βœ… Metrics** +- Simple and **ratio-based metrics** for evaluating experiment outcomes -### Power Analysis Example +##### **πŸ“ˆ Statistical Methods** +- πŸ“Œ **Generalized Estimating Equations (GEE)** +- πŸ“Œ **Mixed Linear Models** for robust inference +- πŸ“Œ **Ordinary Least Squares (OLS)** and **Clustered OLS** with covariates +- πŸ“Œ **T-tests** with variance reduction techniques (**CUPED, CUPAC**) +- πŸ“Œ **Synthetic control methods** for causal inference in observational studies -```python -import numpy as np -import pandas as pd -from cluster_experiments import PowerAnalysis, NormalPowerAnalysis +--- -# Create sample data -N = 1_000 -df = pd.DataFrame({ - "target": np.random.normal(0, 1, size=N), - "date": pd.to_datetime( - np.random.randint( - pd.Timestamp("2024-01-01").value, - pd.Timestamp("2024-01-31").value, - size=N, - ) - ), -}) +### ⚑ Why Use `cluster experiments`? +βœ… **Production-ready** – built for real-world applications +βœ… **Data-driven decision-making** – designed for rigorous statistical analysis +βœ… **Easy to work** – integrates effortlessly with Pandas -# Simulation-based power analysis with CUPED -config = { - "analysis": "ols", - "perturbator": "constant", - "splitter": "non_clustered", - "n_simulations": 50, -} -pw = PowerAnalysis.from_dict(config) -power = pw.power_analysis(df, average_effect=0.1) +--- -# Normal approximation (faster) -npw = NormalPowerAnalysis.from_dict({ - "analysis": "ols", - "splitter": "non_clustered", - "n_simulations": 5, - "time_col": "date", -}) -power_normal = npw.power_analysis(df, average_effect=0.1) -power_line_normal = npw.power_line(df, average_effects=[0.1, 0.2, 0.3]) +`cluster experiments` empowers analysts and data scientists with **scalable, reproducible, and statistically robust** A/B testing workflows. +πŸ”— **Get Started:** [Documentation Link] -# MDE calculation -mde = npw.mde(df, power=0.8) - -# MDE line with length -mde_timeline = npw.mde_time_line( - df, - powers=[0.8], - experiment_length=[7, 14, 21] -) - -print(power, power_line_normal, power_normal, mde, mde_timeline) -``` - -### Experiment Analysis Example - -```python -import numpy as np -import pandas as pd -from cluster_experiments import AnalysisPlan - -N = 1_000 -experiment_data = pd.DataFrame({ - "order_value": np.random.normal(100, 10, size=N), - "delivery_time": np.random.normal(10, 1, size=N), - "experiment_group": np.random.choice(["control", "treatment"], size=N), - "city": np.random.choice(["NYC", "LA"], size=N), - "customer_id": np.random.randint(1, 100, size=N), - "customer_age": np.random.randint(20, 60, size=N), -}) - -# Create analysis plan -plan = AnalysisPlan.from_metrics_dict({ - "metrics": [ - {"alias": "AOV", "name": "order_value"}, - {"alias": "delivery_time", "name": "delivery_time"}, - ], - "variants": [ - {"name": "control", "is_control": True}, - {"name": "treatment", "is_control": False}, - ], - "variant_col": "experiment_group", - "alpha": 0.05, - "dimensions": [ - {"name": "city", "values": ["NYC", "LA"]}, - ], - "analysis_type": "clustered_ols", - "analysis_config": {"cluster_cols": ["customer_id"]}, -}) -# Run analysis -print(plan.analyze(experiment_data).to_dataframe()) -``` - -### Variance Reduction Example - -```python -import numpy as np -import pandas as pd -from cluster_experiments import ( - AnalysisPlan, - SimpleMetric, - Variant, - Dimension, - TargetAggregation, - HypothesisTest -) - -N = 1000 - -experiment_data = pd.DataFrame({ - "order_value": np.random.normal(100, 10, size=N), - "delivery_time": np.random.normal(10, 1, size=N), - "experiment_group": np.random.choice(["control", "treatment"], size=N), - "city": np.random.choice(["NYC", "LA"], size=N), - "customer_id": np.random.randint(1, 100, size=N), - "customer_age": np.random.randint(20, 60, size=N), -}) - -pre_experiment_data = pd.DataFrame({ - "order_value": np.random.normal(100, 10, size=N), - "customer_id": np.random.randint(1, 100, size=N), -}) - -# Define test -cupac_model = TargetAggregation( - agg_col="customer_id", - target_col="order_value" -) - -hypothesis_test = HypothesisTest( - metric=SimpleMetric(alias="AOV", name="order_value"), - analysis_type="clustered_ols", - analysis_config={ - "cluster_cols": ["customer_id"], - "covariates": ["customer_age", "estimate_order_value"], - }, - cupac_config={ - "cupac_model": cupac_model, - "target_col": "order_value", - }, -) - -# Create analysis plan -plan = AnalysisPlan( - tests=[hypothesis_test], - variants=[ - Variant("control", is_control=True), - Variant("treatment", is_control=False), - ], - variant_col="experiment_group", -) - -# Run analysis -results = plan.analyze(experiment_data, pre_experiment_data) -print(results.to_dataframe()) -``` - -## Installation - -You can install this package via `pip`. - -```bash -pip install cluster-experiments -``` - -For detailed documentation and examples, visit our [documentation site](https://david26694.github.io/cluster-experiments/). - -## Features - -The library offers the following classes: - -* Regarding power analysis: - * `PowerAnalysis`: to run power analysis on any experiment design, using simulation - * `PowerAnalysisWithPreExperimentData`: to run power analysis on a clustered/switchback design, but adding pre-experiment df during split and perturbation (especially useful for Synthetic Control) - * `NormalPowerAnalysis`: to run power analysis on any experiment design using the central limit theorem for the distribution of the estimator. It can be used to compute the minimum detectable effect (MDE) for a given power level. - * `ConstantPerturbator`: to artificially perturb treated group with constant perturbations - * `BinaryPerturbator`: to artificially perturb treated group for binary outcomes - * `RelativePositivePerturbator`: to artificially perturb treated group with relative positive perturbations - * `RelativeMixedPerturbator`: to artificially perturb treated group with relative perturbations for positive and negative targets - * `NormalPerturbator`: to artificially perturb treated group with normal distribution perturbations - * `BetaRelativePositivePerturbator`: to artificially perturb treated group with relative positive beta distribution perturbations - * `BetaRelativePerturbator`: to artificially perturb treated group with relative beta distribution perturbations in a specified support interval - * `SegmentedBetaRelativePerturbator`: to artificially perturb treated group with relative beta distribution perturbations in a specified support interval, but using clusters -* Regarding splitting data: - * `ClusteredSplitter`: to split data based on clusters - * `FixedSizeClusteredSplitter`: to split data based on clusters with a fixed size (example: only 1 treatment cluster and the rest in control) - * `BalancedClusteredSplitter`: to split data based on clusters in a balanced way - * `NonClusteredSplitter`: Regular data splitting, no clusters - * `StratifiedClusteredSplitter`: to split based on clusters and strata, balancing the number of clusters in each stratus - * `RepeatedSampler`: for backtests where we have access to counterfactuals, does not split the data, just duplicates the data for all groups - * Switchback splitters (the same can be done with clustered splitters, but there is a convenient way to define switchback splitters using switch frequency): - * `SwitchbackSplitter`: to split data based on clusters and dates, for switchback experiments - * `BalancedSwitchbackSplitter`: to split data based on clusters and dates, for switchback experiments, balancing treatment and control among all clusters - * `StratifiedSwitchbackSplitter`: to split data based on clusters and dates, for switchback experiments, balancing the number of clusters in each stratus - * Washover for switchback experiments: - * `EmptyWashover`: no washover done at all. - * `ConstantWashover`: accepts a timedelta parameter and removes the data when we switch from A to B for the timedelta interval. -* Regarding analysis methods: - * `GeeExperimentAnalysis`: to run GEE analysis on the results of a clustered design - * `MLMExperimentAnalysis`: to run Mixed Linear Model analysis on the results of a clustered design - * `TTestClusteredAnalysis`: to run a t-test on aggregated data for clusters - * `PairedTTestClusteredAnalysis`: to run a paired t-test on aggregated data for clusters - * `ClusteredOLSAnalysis`: to run OLS analysis on the results of a clustered design - * `OLSAnalysis`: to run OLS analysis for non-clustered data - * `DeltaMethodAnalysis`: to run Delta Method Analysis for clustered designs - * `TargetAggregation`: to add pre-experimental data of the outcome to reduce variance - * `SyntheticControlAnalysis`: to run synthetic control analysis -* Regarding experiment analysis workflow: - * `Metric`: abstract class to define a metric to be used in the analysis - * `SimpleMetric`: to create a metric defined at the same level of the data used for the analysis - * `RatioMetric`: to create a metric defined at a lower level than the data used for the analysis - * `Variant`: to define a variant of the experiment - * `Dimension`: to define a dimension to slice the results of the experiment - * `HypothesisTest`: to define a Hypothesis Test with a metric, analysis method, optional analysis configuration, and optional dimensions - * `AnalysisPlan`: to define a plan of analysis with a list of Hypothesis Tests for a dataset and the experiment variants. The `analyze()` method runs the analysis and returns the results - * `AnalysisResults`: to store the results of the analysis -* Other: - * `PowerConfig`: to conveniently configure `PowerAnalysis` class - * `ConfidenceInterval`: to store the data representation of a confidence interval - * `InferenceResults`: to store the structure of complete statistical analysis results +πŸ“¦ **Installation:** +```sh +pip install cluster-experiments \ No newline at end of file diff --git a/docs/quickstart.md b/docs/quickstart.md new file mode 100644 index 00000000..7abf05c4 --- /dev/null +++ b/docs/quickstart.md @@ -0,0 +1,134 @@ +# Quickstart + +## Installation + +You can install **Cluster Experiments** via pip: + +```bash +pip install cluster-experiments +``` + +!!! info "Python Version Support" + **Cluster Experiments** requires **Python 3.9 or higher**. Make sure your environment meets this requirement before proceeding with the installation. + +--- + +## Usage + +Designing and analyzing experiments can feel overwhelming at times. After formulating a testable hypothesis, +you're faced with a series of routine tasks. From collecting and transforming raw data to measuring the statistical significance of your experiment results and constructing confidence intervals, +it can quickly become a repetitive and error-prone process. +*Cluster Experiments* is here to change that. Built on top of well-known packages like `pandas`, `numpy`, `scipy` and `statsmodels`, it automates the core steps of an experiment, streamlining your workflow, saving you time and effort, while maintaining statistical rigor. +## Key Features +- **Modular Design**: Each componentβ€”`Splitter`, `Perturbator`, and `Analysis`β€”is independent, reusable, and can be combined in any way you need. +- **Flexibility**: Whether you're conducting a simple A/B test or a complex clustered experiment, Cluster Experiments adapts to your needs. +- **Statistical Rigor**: Built-in support for advanced statistical methods ensures that your experiments maintain high standards, including clustered standard errors and variance reduction techniques like CUPED and CUPAC. + +The core functionality of *Cluster Experiments* revolves around several intuitive, self-contained classes and methods: + +- **Splitter**: Define how your control and treatment groups are split. +- **Perturbator**: Specify the type of effect you want to test. +- **Analysis**: Perform statistical inference to measure the impact of your experiment. + + +--- + +### `Splitter`: Defining Control and Treatment Groups + +The `Splitter` classes are responsible for dividing your data into control and treatment groups. The way you split your data depends on the **metric** (e.g., simple, ratio) you want to observe and the unit of observation (e.g., users, sessions, time periods). + +#### Features: + +- **Randomized Splits**: Simple random assignment of units to control and treatment groups. +- **Stratified Splits**: Ensure balanced representation of key segments (e.g., geographic regions, user cohorts). +- **Time-Based Splits**: Useful for switchback experiments or time-series data. + +```python +from cluster_experiments import RandomSplitter + +splitter = RandomSplitter( + cluster_cols=["cluster_id"], # Split by clusters + treatment_col="treatment", # Name of the treatment column +) +``` + +--- + +### `Perturbator`: Simulating the Treatment Effect + +The `Perturbator` classes define the type of effect you want to test. It simulates the treatment effect on your data, allowing you to evaluate the impact of your experiment. + +#### Features: + +- **Absolute Effects**: Add a fixed uplift to the treatment group. +- **Relative Effects**: Apply a percentage-based uplift to the treatment group. +- **Custom Effects**: Define your own effect size or distribution. + +```python +from cluster_experiments import ConstantPerturbator + +perturbator = ConstantPerturbator( + average_effect=5.0 # Simulate a nominal 5% uplift +) +``` + +--- + +### `Analysis`: Measuring the Impact + +Once your data is split and the treatment effect is applied, the `Analysis` component helps you measure the statistical significance of the experiment results. It provides tools for calculating effects, confidence intervals, and p-values. + +You can use it for both **experiment design** (pre-experiment phase) and **analysis** (post-experiment phase). + +#### Features: + +- **Statistical Tests**: Perform t-tests, OLS regression, and other hypothesis tests. +- **Effect Size**: Calculate both absolute and relative effects. +- **Confidence Intervals**: Construct confidence intervals for your results. + +Example: + +```python +from cluster_experiments import TTestClusteredAnalysis + +analysis = TTestClusteredAnalysis( + cluster_cols=["cluster_id"], # Cluster-level analysis + treatment_col="treatment", # Name of the treatment column + target_col="outcome" # Metric to analyze +) +``` + +--- + +### Putting It All Together for Experiment Design + +You can combine all classes as inputs in the `PowerAnalysis` class, where you can analyze different experiment settings, power lines, and Minimal Detectable Effects (MDEs). + +```python +from cluster_experiments import PowerAnalysis +from cluster_experiments import RandomSplitter, ConstantPerturbator, TTestClusteredAnalysis + +# Define the components +splitter = RandomSplitter(cluster_cols=["cluster_id"], treatment_col="treatment") +perturbator = ConstantPerturbator(average_effect=0.1) +analysis = TTestClusteredAnalysis(cluster_cols=["cluster_id"], treatment_col="treatment", target_col="outcome") + +# Create the experiment +experiment = PowerAnalysis( + perturbator=perturbator, + splitter=splitter, + analysis=analysis, + target_col="outcome", + treatment_col="treatment" +) + +# Run the experiment +results = experiment.power_analysis() +``` + +--- + +## Next Steps + +- Explore the **Core Documentation** for detailed explanations of each component. +- Check out the **Usage Examples** for practical applications of the package. diff --git a/docs/stylesheets/overrides.css b/docs/stylesheets/overrides.css new file mode 100644 index 00000000..9b029d3c --- /dev/null +++ b/docs/stylesheets/overrides.css @@ -0,0 +1,13 @@ +/* Custom admonition styling */ +.md-typeset .admonition { + border-radius: 8px; + border-left: 4px solid var(--md-primary-fg-color); + box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1); +} + +/* Code block styling */ +.md-typeset pre { + border-radius: 8px; + background-color: var(--md-code-bg-color); + box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1); +} \ No newline at end of file diff --git a/docs/stylesheets/style.css b/docs/stylesheets/style.css new file mode 100644 index 00000000..c22863bb --- /dev/null +++ b/docs/stylesheets/style.css @@ -0,0 +1,9 @@ +/* Apply text justification to all paragraphs in the documentation */ +.md-content p { + text-align: justify; +} + +/* Optionally, justify lists or other specific elements */ +.md-content ul, .md-content ol { + text-align: justify; +} diff --git a/mkdocs.yml b/mkdocs.yml index 6ded671c..a329eb30 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,32 +1,17 @@ site_name: Cluster Experiments Docs -extra_css: [style.css] repo_url: https://github.com/david26694/cluster-experiments site_url: https://david26694.github.io/cluster-experiments/ site_description: Functions to design and run clustered experiments site_author: David Masip use_directory_urls: false edit_uri: blob/main/docs/ +docs_dir: docs +site_dir: site + nav: - - Home: - - Index: index.md - - Cupac example: cupac_example.ipynb - - Custom classes: create_custom_classes.ipynb - - Switchback: - - Stratified switchback: switchback.ipynb - - Switchback calendar visualization: plot_calendars.ipynb - - Visualization - 4-hour switches: plot_calendars_hours.ipynb - - Multiple treatments: multivariate.ipynb - - AA test clustered: aa_test.ipynb - - Paired T test: paired_ttest.ipynb - - Different hypotheses tests: analysis_with_different_hypotheses.ipynb - - Washover: washover_example.ipynb - - Normal Power: - - Compare with simulation: normal_power.ipynb - - Time-lines: normal_power_lines.ipynb - - Synthetic control: synthetic_control.ipynb - - Experiment analysis workflow: experiment_analysis.ipynb - - Delta Method Analysis: delta_method.ipynb - - API: + - Home: index.md + - Quickstart: quickstart.md + - Core Documentation: - Experiment analysis methods: api/experiment_analysis.md - Perturbators: api/perturbator.md - Splitter: api/random_splitter.md @@ -39,24 +24,65 @@ nav: - Dimension: api/dimension.md - Hypothesis Test: api/hypothesis_test.md - Analysis Plan: api/analysis_plan.md + - Usage Examples: + - CUPAC: cupac_example.ipynb + - Switchback: + - Stratified switchback: switchback.ipynb + - Switchback calendar visualization: plot_calendars.ipynb + - Visualization - 4-hour switches: plot_calendars_hours.ipynb + - Multiple treatments: multivariate.ipynb + - AA test clustered: aa_test.ipynb + - Paired T test: paired_ttest.ipynb + - Different hypotheses tests: analysis_with_different_hypotheses.ipynb + - Washover: washover_example.ipynb + - Normal Power: + - Compare with simulation: normal_power.ipynb + - Time-lines: normal_power_lines.ipynb + - Synthetic control: synthetic_control.ipynb + - Experiment analysis workflow: experiment_analysis.ipynb + - Delta Method Analysis: delta_method.ipynb + - Contribute: + - Contributing Guidelines: development/contributing.md + - Code Structure: development/code_structure.md + - Testing: development/testing.md + - Building Documentation: development/building_docs.md + +extra: + social: + - icon: fontawesome/brands/github + link: https://github.com/david26694/cluster-experiments + plugins: - mkdocstrings: watch: - cluster_experiments - mkdocs-jupyter - search + +extra_css: + - stylesheets/overrides.css + - stylesheets/style.css + copyright: Copyright © 2022 Maintained by David Masip. + theme: name: material font: text: Ubuntu code: Ubuntu Mono - feature: - tabs: true + features: + - content.tabs + - content.code.annotate + - navigation.instant + - navigation.tracking + - navigation.sections + - navigation.top palette: primary: indigo accent: blue + markdown_extensions: + - admonition - codehilite - pymdownx.inlinehilite - pymdownx.superfences @@ -66,4 +92,4 @@ markdown_extensions: - pymdownx.highlight: use_pygments: true - toc: - permalink: true + permalink: true \ No newline at end of file From 6363d80370679cd334ac04f85de66932f56caafe Mon Sep 17 00:00:00 2001 From: luizhsuperti Date: Wed, 26 Feb 2025 17:28:44 +0100 Subject: [PATCH 3/9] Add API subfolder --- mkdocs.yml | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/mkdocs.yml b/mkdocs.yml index a329eb30..8dfe50c9 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -12,18 +12,19 @@ nav: - Home: index.md - Quickstart: quickstart.md - Core Documentation: - - Experiment analysis methods: api/experiment_analysis.md - - Perturbators: api/perturbator.md - - Splitter: api/random_splitter.md - - Pre experiment outcome model: api/cupac_model.md - - Power config: api/power_config.md - - Power analysis: api/power_analysis.md - - Washover: api/washover.md - - Metric: api/metric.md - - Variant: api/variant.md - - Dimension: api/dimension.md - - Hypothesis Test: api/hypothesis_test.md - - Analysis Plan: api/analysis_plan.md + - API: + - Experiment analysis methods: api/experiment_analysis.md + - Perturbators: api/perturbator.md + - Splitter: api/random_splitter.md + - Pre experiment outcome model: api/cupac_model.md + - Power config: api/power_config.md + - Power analysis: api/power_analysis.md + - Washover: api/washover.md + - Metric: api/metric.md + - Variant: api/variant.md + - Dimension: api/dimension.md + - Hypothesis Test: api/hypothesis_test.md + - Analysis Plan: api/analysis_plan.md - Usage Examples: - CUPAC: cupac_example.ipynb - Switchback: From ba44999d35b0bc464c180e1ec0c9855a8b9ec1f5 Mon Sep 17 00:00:00 2001 From: luizhsuperti Date: Mon, 16 Jun 2025 12:20:29 +0200 Subject: [PATCH 4/9] Fix typos, white spaces and clarified README as per reviewer feedback --- README.md | 53 ++++++++++++++-------------------- docs/quickstart.md | 2 +- docs/stylesheets/overrides.css | 2 +- mkdocs.yml | 29 ++++++++++--------- 4 files changed, 38 insertions(+), 48 deletions(-) diff --git a/README.md b/README.md index 38cf2eb9..aad07e43 100644 --- a/README.md +++ b/README.md @@ -11,47 +11,37 @@ https://codecov.io/gh/david26694/cluster-experiments/branch/main/graph/badge.svg ![License](https://img.shields.io/github/license/david26694/cluster-experiments) [![Pypi version](https://img.shields.io/pypi/pyversions/cluster-experiments.svg)](https://pypi.python.org/pypi/cluster-experiments) -**`cluster experiments`** is a comprehensive Python library for end-to-end A/B testing workflows, designed for seamless integration with Pandas in production environments. +**`cluster experiments`** is a comprehensive Python library for end-to-end A/B testing workflows. --- -## πŸš€ Key Features +## πŸš€ Key Features -### πŸ“Œ **Experiment Design & Planning** -- **Power analysis** and **Minimal Detectable Effect (MDE)** estimation -- Support for **complex experimental designs**, including: - - 🏒 **Cluster randomization** - - πŸ”„ **Switchback experiments** +### πŸ“Œ Experiment Design & Planning +- **Power analysis** and **Minimal Detectable Effect (MDE)** estimation + - **Normal Approximation (CLT-based)**: Fast, analytical formulas assuming approximate normality + - Best for large sample sizes and standard A/B tests + - **Monte Carlo Simulation**: Empirically estimate power or MDE by simulating many experiments + - Ideal for complex or non-standard designs (e.g., clustering, non-normal outcomes) -### πŸ›  **Data Preprocessing** -- Tools for **efficient data preparation** -- Seamlessly integrates with **Pandas** for streamlined workflows +- Supports complex **experimental designs**, including: + - 🏒 **Cluster randomization** + - πŸ”„ **Switchback experiments** + - πŸ“Š **Observational studies**, including **synthetic control** -### πŸ“Š **Comprehensive Experiment Analysis** -##### **βœ… Metrics** -- Simple and **ratio-based metrics** for evaluating experiment outcomes +### πŸ§ͺ Statistical Methods for Analysis +- πŸ“Œ **Ordinary Least Squares (OLS)** and **Clustered OLS**, with support for covariates +- 🎯 **Variance Reduction Techniques**: **CUPED** and **CUPAC** -##### **πŸ“ˆ Statistical Methods** -- πŸ“Œ **Generalized Estimating Equations (GEE)** -- πŸ“Œ **Mixed Linear Models** for robust inference -- πŸ“Œ **Ordinary Least Squares (OLS)** and **Clustered OLS** with covariates -- πŸ“Œ **T-tests** with variance reduction techniques (**CUPED, CUPAC**) -- πŸ“Œ **Synthetic control methods** for causal inference in observational studies +### πŸ“ˆ Scalable Experiment Analysis with Scorecards +- Generate **Scorecards** to summarize experiment results, allowing analysis for multiple metrics +- Include **confidence intervals, relative and absolute effect sizes, p-values**, ---- - -### ⚑ Why Use `cluster experiments`? -βœ… **Production-ready** – built for real-world applications -βœ… **Data-driven decision-making** – designed for rigorous statistical analysis -βœ… **Easy to work** – integrates effortlessly with Pandas - ---- - -`cluster experiments` empowers analysts and data scientists with **scalable, reproducible, and statistically robust** A/B testing workflows. +`cluster experiments` empowers analysts and data scientists with **scalable, reproducible, and statistically robust** A/B testing workflows. -πŸ”— **Get Started:** [Documentation Link] +πŸ”— **Get Started:** [Documentation Link] -πŸ“¦ **Installation:** +πŸ“¦ **Installation:** ```sh pip install cluster-experiments ======= @@ -67,7 +57,6 @@ mde_timeline = npw.mde_time_line( print(power, power_line_normal, power_normal, mde, mde_timeline) ``` - ### Experiment Analysis Example ```python diff --git a/docs/quickstart.md b/docs/quickstart.md index 7abf05c4..2b10e458 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -68,7 +68,7 @@ The `Perturbator` classes define the type of effect you want to test. It simulat from cluster_experiments import ConstantPerturbator perturbator = ConstantPerturbator( - average_effect=5.0 # Simulate a nominal 5% uplift + average_effect=5.0 # Simulate a nominal 5% uplift ) ``` diff --git a/docs/stylesheets/overrides.css b/docs/stylesheets/overrides.css index 9b029d3c..bf676b78 100644 --- a/docs/stylesheets/overrides.css +++ b/docs/stylesheets/overrides.css @@ -10,4 +10,4 @@ border-radius: 8px; background-color: var(--md-code-bg-color); box-shadow: 0 2px 4px rgba(0, 0, 0, 0.1); -} \ No newline at end of file +} diff --git a/mkdocs.yml b/mkdocs.yml index 8dfe50c9..db41ed48 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -9,7 +9,7 @@ docs_dir: docs site_dir: site nav: - - Home: index.md + - Home: ../README.md - Quickstart: quickstart.md - Core Documentation: - API: @@ -26,22 +26,23 @@ nav: - Hypothesis Test: api/hypothesis_test.md - Analysis Plan: api/analysis_plan.md - Usage Examples: - - CUPAC: cupac_example.ipynb + - Variance Reduction: + - CUPAC: cupac_example.ipynb - Switchback: - Stratified switchback: switchback.ipynb - Switchback calendar visualization: plot_calendars.ipynb - Visualization - 4-hour switches: plot_calendars_hours.ipynb - - Multiple treatments: multivariate.ipynb - - AA test clustered: aa_test.ipynb - - Paired T test: paired_ttest.ipynb - - Different hypotheses tests: analysis_with_different_hypotheses.ipynb - - Washover: washover_example.ipynb - - Normal Power: - - Compare with simulation: normal_power.ipynb - - Time-lines: normal_power_lines.ipynb - - Synthetic control: synthetic_control.ipynb - - Experiment analysis workflow: experiment_analysis.ipynb - - Delta Method Analysis: delta_method.ipynb + - Multiple treatments: multivariate.ipynb + - AA test clustered: aa_test.ipynb + - Paired T test: paired_ttest.ipynb + - Different hypotheses tests: analysis_with_different_hypotheses.ipynb + - Washover: washover_example.ipynb + - Normal Power: + - Compare with simulation: normal_power.ipynb + - Time-lines: normal_power_lines.ipynb + - Synthetic control: synthetic_control.ipynb + - Delta Method Analysis: delta_method.ipynb + - Experiment analysis workflow: experiment_analysis.ipynb - Contribute: - Contributing Guidelines: development/contributing.md - Code Structure: development/code_structure.md @@ -93,4 +94,4 @@ markdown_extensions: - pymdownx.highlight: use_pygments: true - toc: - permalink: true \ No newline at end of file + permalink: true From 349604904f5086876eb2cb6cc872cc1b17ee93d5 Mon Sep 17 00:00:00 2001 From: luizhsuperti Date: Sat, 1 Nov 2025 15:09:27 +0100 Subject: [PATCH 5/9] Quickstart and readme revamp --- .gitignore | 2 + README.md | 418 +++++++++++++--------- docs/examples/cluster_randomization.ipynb | 256 +++++++++++++ docs/examples/simple_ab_test.ipynb | 183 ++++++++++ docs/power_analysis_guide.md | 349 ++++++++++++++++++ docs/quickstart.md | 310 ++++++++++++---- mkdocs.yml | 85 +++-- 7 files changed, 1331 insertions(+), 272 deletions(-) create mode 100644 docs/examples/cluster_randomization.ipynb create mode 100644 docs/examples/simple_ab_test.ipynb create mode 100644 docs/power_analysis_guide.md diff --git a/.gitignore b/.gitignore index e40cd467..208e6e22 100644 --- a/.gitignore +++ b/.gitignore @@ -171,3 +171,5 @@ todos.txt # experiments/ cluster-experiments.code-workspace +QUICKSTART_RESTRUCTURE.md +DOCUMENTATION_REVAMP_SUMMARY.md diff --git a/README.md b/README.md index aad07e43..08e56291 100644 --- a/README.md +++ b/README.md @@ -1,139 +1,231 @@ -# cluster_experiments +# cluster-experiments [![Downloads](https://static.pepy.tech/badge/cluster-experiments)](https://pepy.tech/project/cluster-experiments) -[![PyPI](https://img.shields.io/pypi/v/cluster-experiments)]( -https://pypi.org/project/cluster-experiments/) +[![PyPI](https://img.shields.io/pypi/v/cluster-experiments)](https://pypi.org/project/cluster-experiments/) [![Unit tests](https://github.com/david26694/cluster-experiments/workflows/Release%20unit%20Tests/badge.svg)](https://github.com/david26694/cluster-experiments/actions) -[![CodeCov]( -https://codecov.io/gh/david26694/cluster-experiments/branch/main/graph/badge.svg)](https://app.codecov.io/gh/david26694/cluster-experiments/) +[![CodeCov](https://codecov.io/gh/david26694/cluster-experiments/branch/main/graph/badge.svg)](https://app.codecov.io/gh/david26694/cluster-experiments/) ![License](https://img.shields.io/github/license/david26694/cluster-experiments) [![Pypi version](https://img.shields.io/pypi/pyversions/cluster-experiments.svg)](https://pypi.python.org/pypi/cluster-experiments) -**`cluster experiments`** is a comprehensive Python library for end-to-end A/B testing workflows. +**`cluster-experiments`** is a comprehensive Python library for **end-to-end A/B testing workflows**, from experiment design to statistical analysis. ---- +## πŸ“– What is cluster-experiments? -## πŸš€ Key Features +`cluster-experiments` provides a complete toolkit for designing, running, and analyzing experiments, with particular strength in handling **clustered randomization** and complex experimental designs. Originally developed to address challenges in **switchback experiments** and scenarios with **network effects** where standard randomization isn't feasible, it has evolved into a general-purpose experimentation framework supporting both simple A/B tests and sophisticated designs. -### πŸ“Œ Experiment Design & Planning -- **Power analysis** and **Minimal Detectable Effect (MDE)** estimation - - **Normal Approximation (CLT-based)**: Fast, analytical formulas assuming approximate normality - - Best for large sample sizes and standard A/B tests - - **Monte Carlo Simulation**: Empirically estimate power or MDE by simulating many experiments - - Ideal for complex or non-standard designs (e.g., clustering, non-normal outcomes) +### Why "cluster"? -- Supports complex **experimental designs**, including: - - 🏒 **Cluster randomization** - - πŸ”„ **Switchback experiments** - - πŸ“Š **Observational studies**, including **synthetic control** +The name reflects the library's origins in handling **cluster-randomized experiments**, where randomization happens at a group level (e.g., stores, cities, time periods) rather than at the individual level. This is critical when: -### πŸ§ͺ Statistical Methods for Analysis -- πŸ“Œ **Ordinary Least Squares (OLS)** and **Clustered OLS**, with support for covariates -- 🎯 **Variance Reduction Techniques**: **CUPED** and **CUPAC** +- **Spillover/Network Effects**: Treatment of one unit affects others (e.g., testing driver incentives in ride-sharing) +- **Operational Constraints**: You can't randomize individuals (e.g., testing restaurant menu changes) +- **Switchback Designs**: Treatment alternates over time periods within the same unit -### πŸ“ˆ Scalable Experiment Analysis with Scorecards -- Generate **Scorecards** to summarize experiment results, allowing analysis for multiple metrics -- Include **confidence intervals, relative and absolute effect sizes, p-values**, +While the library excels at these complex scenarios, it's equally capable of handling standard A/B tests with individual-level randomization. -`cluster experiments` empowers analysts and data scientists with **scalable, reproducible, and statistically robust** A/B testing workflows. +--- -πŸ”— **Get Started:** [Documentation Link] +## πŸš€ Key Features -πŸ“¦ **Installation:** -```sh -pip install cluster-experiments -======= -# MDE calculation -mde = npw.mde(df, power=0.8) - -# MDE line with length -mde_timeline = npw.mde_time_line( - df, - powers=[0.8], - experiment_length=[7, 14, 21] -) +### πŸ“Š **Comprehensive Experiment Design** +- **Power Analysis & Sample Size Calculation** + - Simulation-based (Monte Carlo) for any design complexity + - Analytical (CLT-based) for standard designs + - Minimal Detectable Effect (MDE) estimation + +- **Multiple Experimental Designs** + - Standard A/B tests with individual randomization + - Cluster-randomized experiments + - Switchback/crossover experiments + - Stratified randomization + - Observational studies with Synthetic Control + +### πŸ”¬ **Advanced Statistical Methods** +- **Multiple Analysis Methods** + - OLS and Clustered OLS regression + - T-tests and Paired T-tests + - GEE (Generalized Estimating Equations) + - Mixed Linear Models (MLM) + - Delta Method for ratio metrics + - Synthetic Control for observational data + +- **Variance Reduction Techniques** + - CUPED (Controlled-experiment Using Pre-Experiment Data) + - CUPAC (CUPED with Pre-experiment Aggregations) + - Covariate adjustment + +### πŸ“ˆ **Scalable Analysis Workflow** +- **Scorecard Generation**: Analyze multiple metrics simultaneously +- **Multi-dimensional Slicing**: Break down results by segments +- **Multiple Treatment Arms**: Compare several treatments at once +- **Ratio Metrics**: Built-in support for conversion rates, averages, etc. -print(power, power_line_normal, power_normal, mde, mde_timeline) +--- + +## πŸ“¦ Installation + +```bash +pip install cluster-experiments ``` -### Experiment Analysis Example + +--- + +## ⚑ Quick Example + +Here's a simple example showing how to analyze an experiment with two metrics: a simple metric (conversions) and a ratio metric (conversion rate). ```python -import numpy as np import pandas as pd +import numpy as np from cluster_experiments import AnalysisPlan -N = 1_000 -experiment_data = pd.DataFrame({ - "order_value": np.random.normal(100, 10, size=N), - "delivery_time": np.random.normal(10, 1, size=N), - "experiment_group": np.random.choice(["control", "treatment"], size=N), - "city": np.random.choice(["NYC", "LA"], size=N), - "customer_id": np.random.randint(1, 100, size=N), - "customer_age": np.random.randint(20, 60, size=N), +# Simulate experiment data +np.random.seed(42) +n_users = 1000 + +data = pd.DataFrame({ + 'user_id': range(n_users), + 'variant': np.random.choice(['control', 'treatment'], n_users), + 'orders': np.random.poisson(2.5, n_users), # Number of orders (simple metric) + 'visits': np.random.poisson(10, n_users), # Number of visits (for ratio) }) -# Create analysis plan -plan = AnalysisPlan.from_metrics_dict({ - "metrics": [ - {"alias": "AOV", "name": "order_value"}, - {"alias": "delivery_time", "name": "delivery_time"}, - ], - "variants": [ - {"name": "control", "is_control": True}, - {"name": "treatment", "is_control": False}, +# Add a small treatment effect to orders +data.loc[data['variant'] == 'treatment', 'orders'] += np.random.poisson(0.5, (data['variant'] == 'treatment').sum()) + +# Calculate conversions (users who ordered) +data['converted'] = (data['orders'] > 0).astype(int) + +# Define analysis plan +analysis_plan = AnalysisPlan.from_metrics_dict({ + 'metrics': [ + # Simple metric: total conversions + { + 'alias': 'conversions', + 'name': 'converted', + 'metric_type': 'simple' + }, + # Ratio metric: conversion rate (conversions / visits) + { + 'alias': 'conversion_rate', + 'metric_type': 'ratio', + 'numerator': 'converted', + 'denominator': 'visits' + }, ], - "variant_col": "experiment_group", - "alpha": 0.05, - "dimensions": [ - {"name": "city", "values": ["NYC", "LA"]}, + 'variants': [ + {'name': 'control', 'is_control': True}, + {'name': 'treatment', 'is_control': False}, ], - "analysis_type": "clustered_ols", - "analysis_config": {"cluster_cols": ["customer_id"]}, + 'variant_col': 'variant', + 'analysis_type': 'ols', # Use OLS for simple A/B test }) + # Run analysis -print(plan.analyze(experiment_data).to_dataframe()) +results = analysis_plan.analyze(data) + +# View results as a dataframe +print(results.to_dataframe()) ``` -### Variance Reduction Example +**Output**: A comprehensive scorecard with treatment effects, confidence intervals, and p-values for each metric: -```python -import numpy as np -import pandas as pd -from cluster_experiments import ( - AnalysisPlan, - SimpleMetric, - Variant, - Dimension, - TargetAggregation, - HypothesisTest -) +``` + metric control_mean treatment_mean ... p_value ci_lower ci_upper +0 conversions 0.485 0.532 ... 0.023 0.006 0.088 +1 conversion_rate 0.048 0.053 ... 0.031 0.0004 0.009 +``` -N = 1000 +This simple example demonstrates: +- βœ… Working with both **simple** and **ratio metrics** +- βœ… Easy experiment setup with **dictionary-based configuration** +- βœ… Statistical inference with **confidence intervals and p-values** +- βœ… **Automatic scorecard generation** for multiple metrics -experiment_data = pd.DataFrame({ - "order_value": np.random.normal(100, 10, size=N), - "delivery_time": np.random.normal(10, 1, size=N), - "experiment_group": np.random.choice(["control", "treatment"], size=N), - "city": np.random.choice(["NYC", "LA"], size=N), - "customer_id": np.random.randint(1, 100, size=N), - "customer_age": np.random.randint(20, 60, size=N), -}) +--- -pre_experiment_data = pd.DataFrame({ - "order_value": np.random.normal(100, 10, size=N), - "customer_id": np.random.randint(1, 100, size=N), -}) +## πŸ“š Documentation + +For detailed guides, API references, and advanced examples, visit our [**documentation**](https://david26694.github.io/cluster-experiments/). + +### Key Resources +- [**Quickstart Guide**](https://david26694.github.io/cluster-experiments/quickstart.html): Get up and running in minutes +- [**API Reference**](https://david26694.github.io/cluster-experiments/api/experiment_analysis.html): Detailed class and method documentation +- [**Example Gallery**](https://david26694.github.io/cluster-experiments/cupac_example.html): Real-world use cases and patterns + +--- + +## 🎯 Core Concepts + +The library is built around three main components: + +### 1. **Splitter** - Define how to randomize +Choose how to split your data into control and treatment groups: +- `NonClusteredSplitter`: Standard individual-level randomization +- `ClusteredSplitter`: Cluster-level randomization +- `SwitchbackSplitter`: Time-based alternating treatments +- `StratifiedClusteredSplitter`: Balance randomization across strata + +### 2. **Analysis** - Measure the impact +Select the appropriate statistical method for your design: +- `OLSAnalysis`: Standard regression for A/B tests +- `ClusteredOLSAnalysis`: Clustered standard errors for cluster-randomized designs +- `TTestClusteredAnalysis`: T-tests on cluster-aggregated data +- `GeeExperimentAnalysis`: GEE for correlated observations +- `SyntheticControlAnalysis`: Observational studies with synthetic controls + +### 3. **AnalysisPlan** - Orchestrate your analysis +Define your complete analysis workflow: +- Specify metrics (simple and ratio) +- Define variants and dimensions +- Configure hypothesis tests +- Generate comprehensive scorecards + +For **power analysis**, combine these with: +- **Perturbator**: Simulate treatment effects for power calculations +- **PowerAnalysis**: Estimate statistical power and sample sizes -# Define test +--- + +## πŸ” When to Use cluster-experiments + +βœ… **Use cluster-experiments when you need to:** +- Design and analyze **cluster-randomized experiments** +- Handle **switchback/crossover designs** +- Account for **network effects or spillover** +- Perform **power analysis** for complex designs +- Reduce variance with **CUPED/CUPAC** +- Analyze **multiple metrics** with dimensional slicing +- Work with **ratio metrics** (rates, averages, etc.) + +πŸ“Š **Perfect for:** +- Marketplace/platform experiments (drivers, restaurants, stores) +- Geographic experiments (cities, regions) +- Time-based tests (switchbacks, dayparting) +- Standard A/B tests with advanced analysis needs + +--- + +## πŸ› οΈ Advanced Features + +### Variance Reduction with CUPAC + +Reduce variance by leveraging pre-experiment data: + +```python +from cluster_experiments import AnalysisPlan, TargetAggregation, HypothesisTest, SimpleMetric, Variant + +# Define CUPAC model cupac_model = TargetAggregation( agg_col="customer_id", target_col="order_value" ) -hypothesis_test = HypothesisTest( - metric=SimpleMetric(alias="AOV", name="order_value"), +# Create hypothesis test with CUPAC +test = HypothesisTest( + metric=SimpleMetric(alias="revenue", name="order_value"), analysis_type="clustered_ols", analysis_config={ "cluster_cols": ["customer_id"], @@ -145,81 +237,81 @@ hypothesis_test = HypothesisTest( }, ) -# Create analysis plan plan = AnalysisPlan( - tests=[hypothesis_test], - variants=[ - Variant("control", is_control=True), - Variant("treatment", is_control=False), - ], - variant_col="experiment_group", + tests=[test], + variants=[Variant("control", is_control=True), Variant("treatment")], + variant_col="variant", ) -# Run analysis -results = plan.analyze(experiment_data, pre_experiment_data) -print(results.to_dataframe()) +# Analyze with pre-experiment data +results = plan.analyze(experiment_df, pre_experiment_df) ``` -## Installation +### Power Analysis -You can install this package via `pip`. +Estimate the power of your experiment design: -```bash -pip install cluster-experiments +```python +from cluster_experiments import PowerAnalysis, NormalPowerAnalysis +from cluster_experiments import ClusteredSplitter, ConstantPerturbator, ClusteredOLSAnalysis + +# Simulation-based power analysis +power_sim = PowerAnalysis( + splitter=ClusteredSplitter(cluster_cols=['city']), + perturbator=ConstantPerturbator(average_effect=0.1), + analysis=ClusteredOLSAnalysis(cluster_cols=['city']), + n_simulations=1000 +) + +power = power_sim.power_analysis(historical_data, average_effect=0.1) +print(f"Estimated power: {power:.2%}") + +# Analytical power analysis (faster for standard designs) +power_analytical = NormalPowerAnalysis.from_dict({ + 'cluster_cols': ['city'], + 'analysis': 'clustered_ols' +}) + +mde = power_analytical.mde(historical_data, power=0.8) +print(f"Minimum Detectable Effect at 80% power: {mde:.4f}") ``` -For detailed documentation and examples, visit our [documentation site](https://david26694.github.io/cluster-experiments/). - -## Features - -The library offers the following classes: - -* Regarding power analysis: - * `PowerAnalysis`: to run power analysis on any experiment design, using simulation - * `PowerAnalysisWithPreExperimentData`: to run power analysis on a clustered/switchback design, but adding pre-experiment df during split and perturbation (especially useful for Synthetic Control) - * `NormalPowerAnalysis`: to run power analysis on any experiment design using the central limit theorem for the distribution of the estimator. It can be used to compute the minimum detectable effect (MDE) for a given power level. - * `ConstantPerturbator`: to artificially perturb treated group with constant perturbations - * `BinaryPerturbator`: to artificially perturb treated group for binary outcomes - * `RelativePositivePerturbator`: to artificially perturb treated group with relative positive perturbations - * `RelativeMixedPerturbator`: to artificially perturb treated group with relative perturbations for positive and negative targets - * `NormalPerturbator`: to artificially perturb treated group with normal distribution perturbations - * `BetaRelativePositivePerturbator`: to artificially perturb treated group with relative positive beta distribution perturbations - * `BetaRelativePerturbator`: to artificially perturb treated group with relative beta distribution perturbations in a specified support interval - * `SegmentedBetaRelativePerturbator`: to artificially perturb treated group with relative beta distribution perturbations in a specified support interval, but using clusters -* Regarding splitting data: - * `ClusteredSplitter`: to split data based on clusters - * `FixedSizeClusteredSplitter`: to split data based on clusters with a fixed size (example: only 1 treatment cluster and the rest in control) - * `BalancedClusteredSplitter`: to split data based on clusters in a balanced way - * `NonClusteredSplitter`: Regular data splitting, no clusters - * `StratifiedClusteredSplitter`: to split based on clusters and strata, balancing the number of clusters in each stratus - * `RepeatedSampler`: for backtests where we have access to counterfactuals, does not split the data, just duplicates the data for all groups - * Switchback splitters (the same can be done with clustered splitters, but there is a convenient way to define switchback splitters using switch frequency): - * `SwitchbackSplitter`: to split data based on clusters and dates, for switchback experiments - * `BalancedSwitchbackSplitter`: to split data based on clusters and dates, for switchback experiments, balancing treatment and control among all clusters - * `StratifiedSwitchbackSplitter`: to split data based on clusters and dates, for switchback experiments, balancing the number of clusters in each stratus - * Washover for switchback experiments: - * `EmptyWashover`: no washover done at all. - * `ConstantWashover`: accepts a timedelta parameter and removes the data when we switch from A to B for the timedelta interval. -* Regarding analysis methods: - * `GeeExperimentAnalysis`: to run GEE analysis on the results of a clustered design - * `MLMExperimentAnalysis`: to run Mixed Linear Model analysis on the results of a clustered design - * `TTestClusteredAnalysis`: to run a t-test on aggregated data for clusters - * `PairedTTestClusteredAnalysis`: to run a paired t-test on aggregated data for clusters - * `ClusteredOLSAnalysis`: to run OLS analysis on the results of a clustered design - * `OLSAnalysis`: to run OLS analysis for non-clustered data - * `DeltaMethodAnalysis`: to run Delta Method Analysis for clustered designs - * `TargetAggregation`: to add pre-experimental data of the outcome to reduce variance - * `SyntheticControlAnalysis`: to run synthetic control analysis -* Regarding experiment analysis workflow: - * `Metric`: abstract class to define a metric to be used in the analysis - * `SimpleMetric`: to create a metric defined at the same level of the data used for the analysis - * `RatioMetric`: to create a metric defined at a lower level than the data used for the analysis - * `Variant`: to define a variant of the experiment - * `Dimension`: to define a dimension to slice the results of the experiment - * `HypothesisTest`: to define a Hypothesis Test with a metric, analysis method, optional analysis configuration, and optional dimensions - * `AnalysisPlan`: to define a plan of analysis with a list of Hypothesis Tests for a dataset and the experiment variants. The `analyze()` method runs the analysis and returns the results - * `AnalysisResults`: to store the results of an analysis -* Other: - * `PowerConfig`: to conveniently configure `PowerAnalysis` class - * `ConfidenceInterval`: to store the data representation of a confidence interval - * `InferenceResults`: to store the structure of complete statistical analysis results +--- + +## 🀝 Contributing + +We welcome contributions! See our [Contributing Guidelines](CONTRIBUTING.md) for details on how to: +- Report bugs +- Suggest features +- Submit pull requests +- Write documentation + +--- + +## πŸ“„ License + +This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. + +--- + +## 🌟 Support + +- ⭐ Star us on [GitHub](https://github.com/david26694/cluster-experiments) +- πŸ“ Read the [documentation](https://david26694.github.io/cluster-experiments/) +- πŸ› Report issues on our [issue tracker](https://github.com/david26694/cluster-experiments/issues) +- πŸ’¬ Join discussions in [GitHub Discussions](https://github.com/david26694/cluster-experiments/discussions) + +--- + +## πŸ“š Citation + +If you use cluster-experiments in your research, please cite: + +```bibtex +@software{cluster_experiments, + author = {David Masip and contributors}, + title = {cluster-experiments: A Python library for designing and analyzing experiments}, + url = {https://github.com/david26694/cluster-experiments}, + year = {2022} +} +``` diff --git a/docs/examples/cluster_randomization.ipynb b/docs/examples/cluster_randomization.ipynb new file mode 100644 index 00000000..3ca950cb --- /dev/null +++ b/docs/examples/cluster_randomization.ipynb @@ -0,0 +1,256 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Cluster Randomization Example\n", + "\n", + "This notebook demonstrates how to analyze a **cluster-randomized experiment** where randomization occurs at the group level (e.g., stores, cities, schools) rather than at the individual level.\n", + "\n", + "## Why Cluster Randomization?\n", + "\n", + "Cluster randomization is necessary when:\n", + "\n", + "1. **Spillover Effects**: Treatment of one individual affects others (e.g., testing driver incentives in ride-sharing)\n", + "2. **Operational Constraints**: You can't randomize at the individual level (e.g., testing a store layout)\n", + "3. **Cost Efficiency**: It's cheaper to randomize groups than individuals\n", + "\n", + "## Key Consideration\n", + "\n", + "With cluster randomization, you need to account for **intra-cluster correlation** - observations within the same cluster are more similar than observations from different clusters. This requires using **clustered standard errors** or cluster-level analysis methods.\n", + "\n", + "## Setup\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "from cluster_experiments import AnalysisPlan\n", + "\n", + "# Set random seed for reproducibility\n", + "np.random.seed(42)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Simulate Cluster-Randomized Experiment\n", + "\n", + "Let's simulate an experiment where we test a promotional campaign across different stores. Each store is randomly assigned to control or treatment, and we observe multiple transactions per store.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Define parameters\n", + "n_stores = 50 # Number of stores (clusters)\n", + "transactions_per_store = 100 # Average transactions per store\n", + "\n", + "# Step 1: Randomly assign stores to treatment\n", + "stores = pd.DataFrame({\n", + " 'store_id': range(n_stores),\n", + " 'variant': np.random.choice(['control', 'treatment'], n_stores),\n", + "})\n", + "\n", + "# Step 2: Generate transaction-level data\n", + "transactions = []\n", + "for _, store in stores.iterrows():\n", + " n_transactions = np.random.poisson(transactions_per_store)\n", + " \n", + " # Base purchase amount\n", + " base_amount = 50\n", + " \n", + " # Treatment effect: +$5 average purchase\n", + " treatment_effect = 5 if store['variant'] == 'treatment' else 0\n", + " \n", + " # Store-level random effect (intra-cluster correlation)\n", + " store_effect = np.random.normal(0, 10)\n", + " \n", + " # Generate transactions\n", + " store_transactions = pd.DataFrame({\n", + " 'store_id': store['store_id'],\n", + " 'variant': store['variant'],\n", + " 'purchase_amount': np.random.normal(\n", + " base_amount + treatment_effect + store_effect, \n", + " 20, \n", + " n_transactions\n", + " ).clip(min=0) # No negative purchases\n", + " })\n", + " \n", + " transactions.append(store_transactions)\n", + "\n", + "data = pd.concat(transactions, ignore_index=True)\n", + "\n", + "print(f\"Total transactions: {len(data):,}\")\n", + "print(f\"Stores in control: {(stores['variant'] == 'control').sum()}\")\n", + "print(f\"Stores in treatment: {(stores['variant'] == 'treatment').sum()}\")\n", + "print(f\"\\nFirst few rows:\")\n", + "data.head()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Naive Analysis (WRONG!)\n", + "\n", + "First, let's see what happens if we ignore the clustering and use standard OLS. **This is wrong** because it doesn't account for intra-cluster correlation and will give you incorrect standard errors (typically too small, leading to false positives).\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Naive analysis without clustering\n", + "naive_plan = AnalysisPlan.from_metrics_dict({\n", + " 'metrics': [\n", + " {\n", + " 'alias': 'purchase_amount',\n", + " 'name': 'purchase_amount',\n", + " 'metric_type': 'simple'\n", + " },\n", + " ],\n", + " 'variants': [\n", + " {'name': 'control', 'is_control': True},\n", + " {'name': 'treatment', 'is_control': False},\n", + " ],\n", + " 'variant_col': 'variant',\n", + " 'analysis_type': 'ols', # Standard OLS (WRONG for clustered data!)\n", + "})\n", + "\n", + "naive_results = naive_plan.analyze(data).to_dataframe()\n", + "print(\"=== Naive Analysis (Ignoring Clusters) ===\")\n", + "print(f\"Treatment Effect: ${naive_results.iloc[0]['ate']:.2f}\")\n", + "print(f\"Standard Error: ${naive_results.iloc[0]['ate_se']:.2f}\")\n", + "print(f\"P-value: {naive_results.iloc[0]['p_value']:.4f}\")\n", + "print(f\"95% CI: [${naive_results.iloc[0]['ate_ci_lower']:.2f}, ${naive_results.iloc[0]['ate_ci_upper']:.2f}]\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Correct Analysis with Clustered Standard Errors\n", + "\n", + "Now let's do the **correct** analysis by accounting for the clustering. We'll use `clustered_ols` which computes cluster-robust standard errors.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Correct analysis with clustered standard errors\n", + "clustered_plan = AnalysisPlan.from_metrics_dict({\n", + " 'metrics': [\n", + " {\n", + " 'alias': 'purchase_amount',\n", + " 'name': 'purchase_amount',\n", + " 'metric_type': 'simple'\n", + " },\n", + " ],\n", + " 'variants': [\n", + " {'name': 'control', 'is_control': True},\n", + " {'name': 'treatment', 'is_control': False},\n", + " ],\n", + " 'variant_col': 'variant',\n", + " 'analysis_type': 'clustered_ols', # Clustered OLS (CORRECT!)\n", + " 'analysis_config': {\n", + " 'cluster_cols': ['store_id'] # Specify the clustering variable\n", + " }\n", + "})\n", + "\n", + "clustered_results = clustered_plan.analyze(data).to_dataframe()\n", + "print(\"=== Correct Analysis (With Clustering) ===\")\n", + "print(f\"Treatment Effect: ${clustered_results.iloc[0]['ate']:.2f}\")\n", + "print(f\"Standard Error: ${clustered_results.iloc[0]['ate_se']:.2f}\")\n", + "print(f\"P-value: {clustered_results.iloc[0]['p_value']:.4f}\")\n", + "print(f\"95% CI: [${clustered_results.iloc[0]['ate_ci_lower']:.2f}, ${clustered_results.iloc[0]['ate_ci_upper']:.2f}]\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Compare Results\n", + "\n", + "Let's compare the two approaches side by side:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "comparison = pd.DataFrame({\n", + " 'Method': ['Naive (OLS)', 'Correct (Clustered OLS)'],\n", + " 'Treatment Effect': [\n", + " f\"${naive_results.iloc[0]['ate']:.2f}\",\n", + " f\"${clustered_results.iloc[0]['ate']:.2f}\"\n", + " ],\n", + " 'Standard Error': [\n", + " f\"${naive_results.iloc[0]['ate_se']:.2f}\",\n", + " f\"${clustered_results.iloc[0]['ate_se']:.2f}\"\n", + " ],\n", + " 'P-value': [\n", + " f\"{naive_results.iloc[0]['p_value']:.4f}\",\n", + " f\"{clustered_results.iloc[0]['p_value']:.4f}\"\n", + " ],\n", + " '95% CI': [\n", + " f\"[${naive_results.iloc[0]['ate_ci_lower']:.2f}, ${naive_results.iloc[0]['ate_ci_upper']:.2f}]\",\n", + " f\"[${clustered_results.iloc[0]['ate_ci_lower']:.2f}, ${clustered_results.iloc[0]['ate_ci_upper']:.2f}]\"\n", + " ]\n", + "})\n", + "\n", + "print(\"\\n=== Comparison ===\")\n", + "print(comparison.to_string(index=False))\n", + "print(\"\\nNotice: The clustered standard errors are LARGER, reflecting the\")\n", + "print(\"additional uncertainty from intra-cluster correlation.\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Key Takeaways\n", + "\n", + "1. **Always account for clustering** in your analysis when randomization happens at the cluster level\n", + "2. **Clustered standard errors are typically larger** than naive standard errors\n", + "3. **Ignoring clustering leads to overstated confidence** - you might claim significance when there isn't any\n", + "4. **Use `clustered_ols` analysis type** and specify `cluster_cols` in the analysis config\n", + "\n", + "## When to Use Clustering\n", + "\n", + "Use clustered analysis when:\n", + "- βœ… Randomization is at the group level (stores, cities, schools)\n", + "- βœ… There are spillover effects between individuals\n", + "- βœ… Observations within groups are more similar than across groups\n", + "\n", + "Don't use clustering when:\n", + "- ❌ Randomization is truly at the individual level\n", + "- ❌ There's no reason to believe observations are correlated within groups\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/examples/simple_ab_test.ipynb b/docs/examples/simple_ab_test.ipynb new file mode 100644 index 00000000..e7b4c6a7 --- /dev/null +++ b/docs/examples/simple_ab_test.ipynb @@ -0,0 +1,183 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Simple A/B Test Example\n", + "\n", + "This notebook demonstrates a basic A/B test analysis using `cluster-experiments`.\n", + "\n", + "## Overview\n", + "\n", + "We'll simulate an experiment where we test a new feature's impact on:\n", + "- **Conversions** (simple metric): Whether a user made a purchase\n", + "- **Conversion Rate** (ratio metric): Conversions per visit\n", + "- **Revenue** (simple metric): Total revenue generated\n", + "\n", + "## Setup\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "from cluster_experiments import AnalysisPlan\n", + "\n", + "# Set random seed for reproducibility\n", + "np.random.seed(42)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Generate Simulated Experiment Data\n", + "\n", + "Let's create a dataset with control and treatment groups.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "n_users = 2000\n", + "\n", + "# Create base data\n", + "data = pd.DataFrame({\n", + " 'user_id': range(n_users),\n", + " 'variant': np.random.choice(['control', 'treatment'], n_users),\n", + " 'visits': np.random.poisson(10, n_users), # Number of visits\n", + "})\n", + "\n", + "# Simulate conversions (more likely for treatment)\n", + "data['converted'] = (\n", + " np.random.binomial(1, 0.10, n_users) | # Base conversion rate\n", + " (data['variant'] == 'treatment') & np.random.binomial(1, 0.03, n_users) # +3% for treatment\n", + ").astype(int)\n", + "\n", + "# Simulate revenue (higher for converters and treatment)\n", + "data['revenue'] = 0.0\n", + "converters = data['converted'] == 1\n", + "data.loc[converters, 'revenue'] = np.random.gamma(shape=2, scale=25, size=converters.sum())\n", + "\n", + "# Treatment group gets slightly higher revenue\n", + "treatment_converters = (data['variant'] == 'treatment') & converters\n", + "data.loc[treatment_converters, 'revenue'] *= 1.15\n", + "\n", + "print(f\"Dataset shape: {data.shape}\")\n", + "print(f\"\\nFirst few rows:\")\n", + "data.head(10)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Define Analysis Plan\n", + "\n", + "Now let's define our analysis plan with multiple metrics:\n", + "- **conversions**: Simple metric counting total conversions\n", + "- **conversion_rate**: Ratio metric (conversions / visits)\n", + "- **revenue**: Simple metric for total revenue\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "analysis_plan = AnalysisPlan.from_metrics_dict({\n", + " 'metrics': [\n", + " # Simple metric: total conversions\n", + " {\n", + " 'alias': 'conversions',\n", + " 'name': 'converted',\n", + " 'metric_type': 'simple'\n", + " },\n", + " # Ratio metric: conversion rate\n", + " {\n", + " 'alias': 'conversion_rate', \n", + " 'metric_type': 'ratio',\n", + " 'numerator': 'converted',\n", + " 'denominator': 'visits'\n", + " },\n", + " # Simple metric: total revenue\n", + " {\n", + " 'alias': 'revenue',\n", + " 'name': 'revenue',\n", + " 'metric_type': 'simple'\n", + " },\n", + " ],\n", + " 'variants': [\n", + " {'name': 'control', 'is_control': True},\n", + " {'name': 'treatment', 'is_control': False},\n", + " ],\n", + " 'variant_col': 'variant',\n", + " 'analysis_type': 'ols', # Use OLS for simple A/B test\n", + " 'alpha': 0.05, # 95% confidence level\n", + "})\n", + "\n", + "print(\"Analysis plan created successfully!\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Run Analysis\n", + "\n", + "Let's run the analysis and generate a comprehensive scorecard.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Run analysis\n", + "results = analysis_plan.analyze(data)\n", + "\n", + "# View results as a dataframe\n", + "results_df = results.to_dataframe()\n", + "print(\"\\n=== Experiment Results ===\")\n", + "results_df\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "This example demonstrated:\n", + "\n", + "1. βœ… **Data Simulation**: Creating realistic experiment data\n", + "2. βœ… **Multiple Metric Types**: Analyzing both simple and ratio metrics\n", + "3. βœ… **Easy Configuration**: Using dictionary-based analysis plan setup\n", + "4. βœ… **Comprehensive Results**: Getting treatment effects, confidence intervals, and p-values\n", + "\n", + "## Next Steps\n", + "\n", + "- Try the [CUPAC example](../cupac_example.html) to learn about variance reduction\n", + "- Explore [cluster randomization](cluster_randomization.html) for handling correlated units\n", + "- Learn about [switchback experiments](../switchback.html) for time-based designs\n" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/power_analysis_guide.md b/docs/power_analysis_guide.md new file mode 100644 index 00000000..190b4a35 --- /dev/null +++ b/docs/power_analysis_guide.md @@ -0,0 +1,349 @@ +# Power Analysis Guide + +This guide explains how to design experiments using **power analysis** to determine sample sizes and experiment duration. + +--- + +## What is Power Analysis? + +**Power analysis** helps you answer questions like: +- How many users do I need to detect a 5% lift? +- How long should I run my experiment? +- What's the smallest effect I can reliably detect? + +This is done **before** running your experiment, using historical data to simulate different scenarios. + +--- + +## When to Use Power Analysis + +Use power analysis when: +- βœ… **Planning an experiment**: Determine required sample size +- βœ… **Evaluating feasibility**: Check if an effect is detectable with available data +- βœ… **Optimizing duration**: Balance statistical power with business needs + +Don't need power analysis when: +- ❌ Analyzing completed experiments (use `AnalysisPlan` instead) +- ❌ You have unlimited sample size (though you should still check!) + +--- + +## Two Approaches + +### 1. Normal Approximation (Recommended) + +**Pros:** Fast, uses analytical formulas +**Cons:** Assumes normal distribution (works well for large samples) + +```python +from cluster_experiments import NormalPowerAnalysis + +power_analysis = NormalPowerAnalysis.from_dict({ + 'analysis': 'ols', + 'splitter': 'non_clustered', +}) + +# Calculate MDE for 80% power +mde = power_analysis.mde(historical_data, power=0.8) +print(f"Minimum Detectable Effect: {mde:.2%}") + +# Calculate power for specific effect size +power = power_analysis.power_analysis(historical_data, average_effect=0.05) +print(f"Power for 5% effect: {power:.1%}") +``` + +### 2. Simulation-Based + +**Pros:** Works for any distribution, more accurate for complex designs +**Cons:** Slower (runs many simulations) + +```python +from cluster_experiments import PowerAnalysis +from cluster_experiments import ClusteredSplitter, ConstantPerturbator, ClusteredOLSAnalysis + +# Define components +splitter = ClusteredSplitter(cluster_cols=['store_id']) +perturbator = ConstantPerturbator() # Simulates treatment effect +analysis = ClusteredOLSAnalysis(cluster_cols=['store_id']) + +# Create power analysis +power_analysis = PowerAnalysis( + splitter=splitter, + perturbator=perturbator, + analysis=analysis, + n_simulations=1000 # Number of simulations +) + +# Run power analysis +power = power_analysis.power_analysis(historical_data, average_effect=0.1) +``` + +--- + +## Understanding Components (Simulation-Based) + +For simulation-based power analysis, you need three components: + +### 1. Splitter: How to Randomize + +The **Splitter** defines how to divide your data into control and treatment groups. + +#### Available Splitters + +| Splitter | Use Case | Example | +|----------|----------|---------| +| `NonClusteredSplitter` | Individual-level randomization | User-level A/B test | +| `ClusteredSplitter` | Cluster-level randomization | Store-level test | +| `SwitchbackSplitter` | Time-based alternating treatment | Daily switchback | +| `StratifiedClusteredSplitter` | Balanced cluster randomization | Stratified by region | + +#### Example + +```python +from cluster_experiments import ClusteredSplitter + +splitter = ClusteredSplitter( + cluster_cols=['store_id'], # Randomize at store level +) +``` + +--- + +### 2. Perturbator: Simulating Treatment Effect + +The **Perturbator** simulates the treatment effect on your historical data. This lets you test "what if we had run an experiment with X% lift?" + +#### Available Perturbators + +| Perturbator | Effect Type | Example | +|-------------|-------------|---------| +| `ConstantPerturbator` | Absolute increase | +$5 revenue | +| `RelativePositivePerturbator` | Percentage increase | +10% revenue | +| `BinaryPerturbator` | Binary outcome shift | +5% conversion | +| `NormalPerturbator` | Normally distributed | Variable effect | + +#### Example + +```python +from cluster_experiments import ConstantPerturbator + +perturbator = ConstantPerturbator( + average_effect=5.0 # Add $5 to treatment group +) + +# Or for relative effects +from cluster_experiments import RelativePositivePerturbator + +perturbator = RelativePositivePerturbator( + average_effect=0.10 # 10% increase +) +``` + +--- + +### 3. Analysis: Measuring Impact + +The **Analysis** component specifies which statistical method to use for measuring the treatment effect. + +#### Available Analysis Methods + +| Analysis | Use Case | +|----------|----------| +| `OLSAnalysis` | Standard A/B test | +| `ClusteredOLSAnalysis` | Cluster randomization with clustered SE | +| `TTestClusteredAnalysis` | T-test on cluster-aggregated data | +| `GeeExperimentAnalysis` | Correlated observations (GEE) | +| `SyntheticControlAnalysis` | Observational studies | + +#### Example + +```python +from cluster_experiments import ClusteredOLSAnalysis + +analysis = ClusteredOLSAnalysis( + cluster_cols=['store_id'], # Cluster standard errors +) +``` + +--- + +## Complete Example: Store-Level Experiment + +Let's design a store-level promotional experiment: + +```python +import pandas as pd +import numpy as np +from cluster_experiments import PowerAnalysis +from cluster_experiments import ClusteredSplitter, RelativePositivePerturbator, ClusteredOLSAnalysis + +# Historical data: daily store sales +np.random.seed(42) +n_stores = 50 +days = 30 + +historical_data = [] +for store_id in range(n_stores): + for day in range(days): + historical_data.append({ + 'store_id': store_id, + 'day': day, + 'revenue': np.random.gamma(shape=100, scale=5) + np.random.normal(0, 50) + }) + +df = pd.DataFrame(historical_data) + +# Define power analysis components +splitter = ClusteredSplitter(cluster_cols=['store_id']) +perturbator = RelativePositivePerturbator() # % increase +analysis = ClusteredOLSAnalysis(cluster_cols=['store_id']) + +power_analysis = PowerAnalysis( + splitter=splitter, + perturbator=perturbator, + analysis=analysis, + target_col='revenue', + n_simulations=500 +) + +# Question 1: What power do we have for 10% lift? +power = power_analysis.power_analysis(df, average_effect=0.10) +print(f"Power for 10% lift: {power:.1%}") + +# Question 2: How does power change with effect size? +power_curve = power_analysis.power_line( + df, + average_effects=[0.05, 0.10, 0.15, 0.20] +) +print("\nPower Curve:") +print(power_curve) +``` + +--- + +## Power Curves and Timelines + +### Power Curve (Effect Size) + +See how power changes with different effect sizes: + +```python +power_curve = power_analysis.power_line( + df, + average_effects=[0.03, 0.05, 0.07, 0.10, 0.15] +) +``` + +### MDE Timeline (Experiment Duration) + +See how MDE changes with experiment length: + +```python +from cluster_experiments import NormalPowerAnalysis + +npw = NormalPowerAnalysis.from_dict({ + 'analysis': 'clustered_ols', + 'cluster_cols': ['store_id'], + 'time_col': 'date', +}) + +mde_timeline = npw.mde_time_line( + df, + powers=[0.8], # 80% power + experiment_length=[7, 14, 21, 30] # days +) +``` + +--- + +## Dictionary Configuration + +For simpler setups, use dictionary configuration: + +```python +from cluster_experiments import PowerAnalysis + +config = { + 'splitter': 'clustered', + 'cluster_cols': ['store_id'], + 'perturbator': 'relative_positive', + 'analysis': 'clustered_ols', + 'n_simulations': 500, +} + +power_analysis = PowerAnalysis.from_dict(config) +``` + +--- + +## Tips and Best Practices + +### 1. Use Historical Data + +- Use real historical data that matches your experiment setup +- More data = more reliable power estimates +- Ensure your historical period is representative + +### 2. Match Components to Design + +- If experiment is cluster-randomized, use `ClusteredSplitter` and `ClusteredOLSAnalysis` +- If individual-level, use `NonClusteredSplitter` and `OLSAnalysis` +- Match perturbator to expected effect type (absolute vs relative) + +### 3. Simulation Count + +- More simulations = more accurate but slower +- Start with 100-500 for exploration +- Use 1000+ for final estimates + +### 4. Power Standards + +- **80% power** is standard (80% chance of detecting effect if it exists) +- **Higher power** requires larger sample size or longer duration +- Consider business tradeoffs (speed vs certainty) + +--- + +## Common Questions + +### Q: What's the difference between power analysis and experiment analysis? + +**Power analysis** (before experiment): +- Uses historical data +- Simulates different scenarios +- Answers: "How much data do I need?" + +**Experiment analysis** (after experiment): +- Uses actual experiment data +- Measures real treatment effects +- Answers: "What was the impact?" + +### Q: When should I use simulation vs normal approximation? + +**Normal approximation:** +- βœ… Fast results +- βœ… Standard experimental designs +- βœ… Large sample sizes + +**Simulation:** +- βœ… Complex designs (switchback, stratified) +- βœ… Non-normal distributions +- βœ… Small sample sizes + +### Q: My power is too low, what can I do? + +Options to increase power: +1. **Increase sample size** (more users; this can be achieved by running the experiment longer) +3. **Use variance reduction** (CUPAC/CUPED) +4. **Detect larger effects** (focus on bigger changes) +5. **Use more sensitive metrics** + +--- + +## Next Steps + +- **[Normal Power Example](normal_power.html)** - Compare simulation vs normal approximation +- **[Power Lines Example](normal_power_lines.html)** - Visualize power curves +- **[Switchback Power](switchback.html)** - Power analysis for switchback designs +- **[API Reference](api/power_analysis.html)** - Detailed power analysis documentation + diff --git a/docs/quickstart.md b/docs/quickstart.md index 2b10e458..f4b0ae81 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -1,134 +1,298 @@ # Quickstart +Get started with `cluster-experiments` in minutes! This guide will walk you through installation and your first experiment analysis. + +--- + ## Installation -You can install **Cluster Experiments** via pip: +Install via pip: ```bash pip install cluster-experiments ``` -!!! info "Python Version Support" - **Cluster Experiments** requires **Python 3.9 or higher**. Make sure your environment meets this requirement before proceeding with the installation. +!!! info "Requirements" + - **Python 3.9 or higher** + - Main dependencies: `pandas`, `numpy`, `scipy`, `statsmodels` --- -## Usage +## Your First Analysis (5 minutes) + +Let's analyze a simple A/B test with multiple metrics. This is the most common use case. -Designing and analyzing experiments can feel overwhelming at times. After formulating a testable hypothesis, -you're faced with a series of routine tasks. From collecting and transforming raw data to measuring the statistical significance of your experiment results and constructing confidence intervals, -it can quickly become a repetitive and error-prone process. -*Cluster Experiments* is here to change that. Built on top of well-known packages like `pandas`, `numpy`, `scipy` and `statsmodels`, it automates the core steps of an experiment, streamlining your workflow, saving you time and effort, while maintaining statistical rigor. -## Key Features -- **Modular Design**: Each componentβ€”`Splitter`, `Perturbator`, and `Analysis`β€”is independent, reusable, and can be combined in any way you need. -- **Flexibility**: Whether you're conducting a simple A/B test or a complex clustered experiment, Cluster Experiments adapts to your needs. -- **Statistical Rigor**: Built-in support for advanced statistical methods ensures that your experiments maintain high standards, including clustered standard errors and variance reduction techniques like CUPED and CUPAC. +```python +import pandas as pd +import numpy as np +from cluster_experiments import AnalysisPlan + +# Simulate experiment data +np.random.seed(42) +n_users = 1000 + +data = pd.DataFrame({ + 'user_id': range(n_users), + 'variant': np.random.choice(['control', 'treatment'], n_users), + 'orders': np.random.poisson(2.5, n_users), + 'visits': np.random.poisson(10, n_users), +}) + +# Add treatment effect +data.loc[data['variant'] == 'treatment', 'orders'] += np.random.poisson(0.5, (data['variant'] == 'treatment').sum()) +data['converted'] = (data['orders'] > 0).astype(int) + +# Define analysis plan +analysis_plan = AnalysisPlan.from_metrics_dict({ + 'metrics': [ + # Simple metric + {'alias': 'conversions', 'name': 'converted', 'metric_type': 'simple'}, + # Ratio metric + {'alias': 'conversion_rate', 'metric_type': 'ratio', + 'numerator': 'converted', 'denominator': 'visits'}, + ], + 'variants': [ + {'name': 'control', 'is_control': True}, + {'name': 'treatment', 'is_control': False}, + ], + 'variant_col': 'variant', + 'analysis_type': 'ols', +}) + +# Run analysis +results = analysis_plan.analyze(data) +print(results.to_dataframe()) +``` -The core functionality of *Cluster Experiments* revolves around several intuitive, self-contained classes and methods: +**Output:** A comprehensive scorecard with treatment effects, confidence intervals, and p-values! + +--- -- **Splitter**: Define how your control and treatment groups are split. -- **Perturbator**: Specify the type of effect you want to test. -- **Analysis**: Perform statistical inference to measure the impact of your experiment. +## Understanding Your Results +The results dataframe includes: + +| Column | Description | +|--------|-------------| +| `metric` | Name of the metric being analyzed | +| `control_mean` | Average value in control group | +| `treatment_mean` | Average value in treatment group | +| `ate` | Average Treatment Effect (absolute difference) | +| `ate_ci_lower/upper` | 95% confidence interval for ATE | +| `p_value` | Statistical significance (< 0.05 = significant) | +| `relative_effect` | Percentage change (lift) | + +!!! tip "Interpreting Results" + - **p_value < 0.05**: Result is statistically significant + - **relative_effect**: Shows % change (e.g., 0.10 = 10% increase) + - **Confidence interval**: If it doesn't include 0, effect is significant --- -### `Splitter`: Defining Control and Treatment Groups +## Common Use Cases -The `Splitter` classes are responsible for dividing your data into control and treatment groups. The way you split your data depends on the **metric** (e.g., simple, ratio) you want to observe and the unit of observation (e.g., users, sessions, time periods). +### 1. Analyzing an Experiment -#### Features: +**When:** You've already run your experiment and have the data. -- **Randomized Splits**: Simple random assignment of units to control and treatment groups. -- **Stratified Splits**: Ensure balanced representation of key segments (e.g., geographic regions, user cohorts). -- **Time-Based Splits**: Useful for switchback experiments or time-series data. +**Example:** See [Simple A/B Test](examples/simple_ab_test.html) for a complete walkthrough. ```python -from cluster_experiments import RandomSplitter +# Use AnalysisPlan with your experiment data +results = analysis_plan.analyze(experiment_data) +``` -splitter = RandomSplitter( - cluster_cols=["cluster_id"], # Split by clusters - treatment_col="treatment", # Name of the treatment column -) +--- + +### 2. Power Analysis (Sample Size Planning) + +**When:** You're designing an experiment and need to know how many users/time you need. + +**Example:** Calculate power or Minimum Detectable Effect (MDE). + +```python +from cluster_experiments import NormalPowerAnalysis + +# Define your analysis setup +power_analysis = NormalPowerAnalysis.from_dict({ + 'analysis': 'ols', + 'splitter': 'non_clustered', +}) + +# Calculate MDE for 80% power +mde = power_analysis.mde(historical_data, power=0.8) +print(f"Need {mde:.2%} effect size for 80% power") + +# Or calculate power for a given effect size +power = power_analysis.power_analysis(historical_data, average_effect=0.05) +print(f"Power: {power:.1%}") ``` +**Learn more:** See [Power Analysis Guide](power_analysis_guide.html) for detailed explanation. + --- -### `Perturbator`: Simulating the Treatment Effect +### 3. Cluster Randomization -The `Perturbator` classes define the type of effect you want to test. It simulates the treatment effect on your data, allowing you to evaluate the impact of your experiment. +**When:** Randomization happens at group level (stores, cities) rather than individual level. -#### Features: +**Why:** Required when there are spillover effects or operational constraints. -- **Absolute Effects**: Add a fixed uplift to the treatment group. -- **Relative Effects**: Apply a percentage-based uplift to the treatment group. -- **Custom Effects**: Define your own effect size or distribution. +**Example:** ```python -from cluster_experiments import ConstantPerturbator +# Use clustered_ols for cluster-randomized experiments +analysis_plan = AnalysisPlan.from_metrics_dict({ + 'metrics': [{'alias': 'revenue', 'name': 'purchase_amount'}], + 'variants': [ + {'name': 'control', 'is_control': True}, + {'name': 'treatment', 'is_control': False}, + ], + 'variant_col': 'variant', + 'analysis_type': 'clustered_ols', # ← Key difference! + 'analysis_config': { + 'cluster_cols': ['store_id'] # ← Specify clustering variable + } +}) +``` + +**Learn more:** See [Cluster Randomization Example](examples/cluster_randomization.html). + +--- + +### 4. Variance Reduction (CUPAC/CUPED) + +**When:** You have pre-experiment data and want to reduce variance for more sensitive tests. + +**Benefits:** Detect smaller effects with same sample size. + +**Example:** -perturbator = ConstantPerturbator( - average_effect=5.0 # Simulate a nominal 5% uplift +```python +from cluster_experiments import TargetAggregation, HypothesisTest, SimpleMetric, Variant + +# Define CUPAC model using pre-experiment data +cupac_model = TargetAggregation( + agg_col="customer_id", + target_col="order_value" +) + +# Create hypothesis test with CUPAC +test = HypothesisTest( + metric=SimpleMetric(alias="revenue", name="order_value"), + analysis_type="clustered_ols", + analysis_config={ + "cluster_cols": ["customer_id"], + "covariates": ["customer_age", "estimate_order_value"], + }, + cupac_config={ + "cupac_model": cupac_model, + "target_col": "order_value", + }, ) + +plan = AnalysisPlan( + tests=[test], + variants=[Variant("control", is_control=True), Variant("treatment")], + variant_col="variant", +) + +# Analyze with both experiment and pre-experiment data +results = plan.analyze(experiment_data, pre_experiment_data) ``` +**Learn more:** See [CUPAC Example](cupac_example.html). + --- -### `Analysis`: Measuring the Impact +## Ratio Metrics -Once your data is split and the treatment effect is applied, the `Analysis` component helps you measure the statistical significance of the experiment results. It provides tools for calculating effects, confidence intervals, and p-values. +`cluster-experiments` has built-in support for ratio metrics (e.g., conversion rate, average order value). -You can use it for both **experiment design** (pre-experiment phase) and **analysis** (post-experiment phase). +```python +# Ratio metric: conversions / visits +{ + 'alias': 'conversion_rate', + 'metric_type': 'ratio', + 'numerator': 'converted', # Numerator column + 'denominator': 'visits' # Denominator column +} +``` -#### Features: +The library automatically handles the statistical complexities of ratio metrics using the Delta Method. -- **Statistical Tests**: Perform t-tests, OLS regression, and other hypothesis tests. -- **Effect Size**: Calculate both absolute and relative effects. -- **Confidence Intervals**: Construct confidence intervals for your results. +--- -Example: +## Multi-Dimensional Analysis -```python -from cluster_experiments import TTestClusteredAnalysis +Slice your results by dimensions (e.g., city, device type): -analysis = TTestClusteredAnalysis( - cluster_cols=["cluster_id"], # Cluster-level analysis - treatment_col="treatment", # Name of the treatment column - target_col="outcome" # Metric to analyze -) +```python +analysis_plan = AnalysisPlan.from_metrics_dict({ + 'metrics': [...], + 'variants': [...], + 'variant_col': 'variant', + 'dimensions': [ + {'name': 'city', 'values': ['NYC', 'LA', 'Chicago']}, + {'name': 'device', 'values': ['mobile', 'desktop']}, + ], + 'analysis_type': 'ols', +}) ``` +Results will include treatment effects for each dimension slice! + --- -### Putting It All Together for Experiment Design +## Quick Reference -You can combine all classes as inputs in the `PowerAnalysis` class, where you can analyze different experiment settings, power lines, and Minimal Detectable Effects (MDEs). +### Analysis Types +Choose the appropriate analysis method: + +| Analysis Type | When to Use | +|--------------|-------------| +| `ols` | Standard A/B test, individual randomization | +| `clustered_ols` | Cluster randomization (stores, cities, etc.) | +| `gee` | Repeated measures, correlated observations | +| `mlm` | Multi-level/hierarchical data | +| `synthetic_control` | Observational studies, no randomization | + +### Dictionary vs Class-Based API + +Two ways to define analysis plans: + +**Dictionary (simpler):** ```python -from cluster_experiments import PowerAnalysis -from cluster_experiments import RandomSplitter, ConstantPerturbator, TTestClusteredAnalysis - -# Define the components -splitter = RandomSplitter(cluster_cols=["cluster_id"], treatment_col="treatment") -perturbator = ConstantPerturbator(average_effect=0.1) -analysis = TTestClusteredAnalysis(cluster_cols=["cluster_id"], treatment_col="treatment", target_col="outcome") - -# Create the experiment -experiment = PowerAnalysis( - perturbator=perturbator, - splitter=splitter, - analysis=analysis, - target_col="outcome", - treatment_col="treatment" -) +plan = AnalysisPlan.from_metrics_dict({...}) +``` + +**Class-based (more control):** +```python +from cluster_experiments import HypothesisTest, SimpleMetric, Variant -# Run the experiment -results = experiment.power_analysis() +plan = AnalysisPlan( + tests=[HypothesisTest(metric=SimpleMetric(...), ...)], + variants=[Variant(...)], + variant_col='variant' +) ``` --- ## Next Steps -- Explore the **Core Documentation** for detailed explanations of each component. -- Check out the **Usage Examples** for practical applications of the package. +Now that you've completed your first analysis, explore: + +- πŸ“– **[API Reference](api/experiment_analysis.html)** - Detailed documentation for all classes +- **[Example Gallery](cupac_example.html)** - Real-world use cases and patterns +- **[Power Analysis Guide](power_analysis_guide.html)** - Design experiments with confidence +- 🀝 **[Contributing](../CONTRIBUTING.md)** - Help improve the library + +--- + +## Getting Help + +- πŸ“ [Documentation](https://david26694.github.io/cluster-experiments/) +- πŸ› [Report Issues](https://github.com/david26694/cluster-experiments/issues) +- πŸ’¬ [Discussions](https://github.com/david26694/cluster-experiments/discussions) diff --git a/mkdocs.yml b/mkdocs.yml index db41ed48..49784f27 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -9,45 +9,58 @@ docs_dir: docs site_dir: site nav: - - Home: ../README.md + - Home: index.md - Quickstart: quickstart.md - - Core Documentation: - - API: - - Experiment analysis methods: api/experiment_analysis.md - - Perturbators: api/perturbator.md - - Splitter: api/random_splitter.md - - Pre experiment outcome model: api/cupac_model.md - - Power config: api/power_config.md - - Power analysis: api/power_analysis.md - - Washover: api/washover.md - - Metric: api/metric.md - - Variant: api/variant.md - - Dimension: api/dimension.md - - Hypothesis Test: api/hypothesis_test.md - - Analysis Plan: api/analysis_plan.md - - Usage Examples: + - Power Analysis Guide: power_analysis_guide.md + + - API Reference: + - Experiment Analysis: + - Analysis Plan: api/analysis_plan.md + - Analysis Results: api/analysis_results.md + - Experiment Analysis Methods: api/experiment_analysis.md + - Hypothesis Test: api/hypothesis_test.md + - Metrics & Variants: + - Metric: api/metric.md + - Variant: api/variant.md + - Dimension: api/dimension.md + - Power Analysis: + - Power Analysis: api/power_analysis.md + - Power Config: api/power_config.md + - Randomization: + - Splitters: api/random_splitter.md - Variance Reduction: - - CUPAC: cupac_example.ipynb + - CUPAC Model: api/cupac_model.md - Switchback: - - Stratified switchback: switchback.ipynb - - Switchback calendar visualization: plot_calendars.ipynb - - Visualization - 4-hour switches: plot_calendars_hours.ipynb - - Multiple treatments: multivariate.ipynb - - AA test clustered: aa_test.ipynb - - Paired T test: paired_ttest.ipynb - - Different hypotheses tests: analysis_with_different_hypotheses.ipynb - - Washover: washover_example.ipynb - - Normal Power: - - Compare with simulation: normal_power.ipynb - - Time-lines: normal_power_lines.ipynb - - Synthetic control: synthetic_control.ipynb - - Delta Method Analysis: delta_method.ipynb - - Experiment analysis workflow: experiment_analysis.ipynb - - Contribute: - - Contributing Guidelines: development/contributing.md - - Code Structure: development/code_structure.md - - Testing: development/testing.md - - Building Documentation: development/building_docs.md + - Washover: api/washover.md + - Perturbators: api/perturbator.md + + - Examples: + - Basic Usage: + - Simple A/B Test: examples/simple_ab_test.ipynb + - Experiment Analysis Workflow: experiment_analysis.ipynb + - AA Test (Clustered): aa_test.ipynb + - Analysis Methods: + - Different Hypothesis Tests: analysis_with_different_hypotheses.ipynb + - Paired T-Test: paired_ttest.ipynb + - Delta Method Analysis: delta_method.ipynb + - Variance Reduction: + - CUPAC Example: cupac_example.ipynb + - Cluster Experiments: + - Cluster Randomization: examples/cluster_randomization.ipynb + - Switchback Experiments: + - Stratified Switchback: switchback.ipynb + - Calendar Visualization: plot_calendars.ipynb + - 4-Hour Switches: plot_calendars_hours.ipynb + - Washover Example: washover_example.ipynb + - Power Analysis: + - Normal Power Comparison: normal_power.ipynb + - Power Time-Lines: normal_power_lines.ipynb + - Advanced Topics: + - Multiple Treatments: multivariate.ipynb + - Synthetic Control: synthetic_control.ipynb + - Custom Classes: create_custom_classes.ipynb + + - Contributing: CONTRIBUTING.md extra: social: From a28517d480af851b6b3110ab2acd1b7c7bb42679 Mon Sep 17 00:00:00 2001 From: luizhsuperti Date: Sat, 1 Nov 2025 16:33:36 +0100 Subject: [PATCH 6/9] revamp v2. readme --- README.md | 292 +++++++++++++++--------- docs/examples/simple_ab_test.ipynb | 36 ++- docs/power_analysis_guide.md | 349 ----------------------------- docs/quickstart.md | 162 ++++++++++--- mkdocs.yml | 2 +- 5 files changed, 346 insertions(+), 495 deletions(-) delete mode 100644 docs/power_analysis_guide.md diff --git a/README.md b/README.md index 08e56291..7fd4f312 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ ## πŸ“– What is cluster-experiments? -`cluster-experiments` provides a complete toolkit for designing, running, and analyzing experiments, with particular strength in handling **clustered randomization** and complex experimental designs. Originally developed to address challenges in **switchback experiments** and scenarios with **network effects** where standard randomization isn't feasible, it has evolved into a general-purpose experimentation framework supporting both simple A/B tests and sophisticated designs. +`cluster-experiments` provides a complete toolkit for designing, running, and analyzing experiments, with particular strength in handling **clustered randomization** and complex experimental designs. Originally developed to address challenges in **switchback experiments** and scenarios with **network effects** where standard randomization isn't feasible, it has evolved into a general-purpose experimentation framework supporting both simple A/B tests and other randomization designs. ### Why "cluster"? @@ -23,16 +23,16 @@ The name reflects the library's origins in handling **cluster-randomized experim - **Operational Constraints**: You can't randomize individuals (e.g., testing restaurant menu changes) - **Switchback Designs**: Treatment alternates over time periods within the same unit -While the library excels at these complex scenarios, it's equally capable of handling standard A/B tests with individual-level randomization. +While the library is aimed at these scenarios, it's equally capable of handling standard A/B tests with individual-level randomization. --- -## πŸš€ Key Features +## Key Features -### πŸ“Š **Comprehensive Experiment Design** +### **Experiment Design** - **Power Analysis & Sample Size Calculation** - Simulation-based (Monte Carlo) for any design complexity - - Analytical (CLT-based) for standard designs + - Analytical, (CLT-based) for standard designs - Minimal Detectable Effect (MDE) estimation - **Multiple Experimental Designs** @@ -42,10 +42,9 @@ While the library excels at these complex scenarios, it's equally capable of han - Stratified randomization - Observational studies with Synthetic Control -### πŸ”¬ **Advanced Statistical Methods** +### **Statistical Methods** - **Multiple Analysis Methods** - OLS and Clustered OLS regression - - T-tests and Paired T-tests - GEE (Generalized Estimating Equations) - Mixed Linear Models (MLM) - Delta Method for ratio metrics @@ -56,7 +55,7 @@ While the library excels at these complex scenarios, it's equally capable of han - CUPAC (CUPED with Pre-experiment Aggregations) - Covariate adjustment -### πŸ“ˆ **Scalable Analysis Workflow** +### **Analysis Workflow** - **Scorecard Generation**: Analyze multiple metrics simultaneously - **Multi-dimensional Slicing**: Break down results by segments - **Multiple Treatment Arms**: Compare several treatments at once @@ -74,75 +73,110 @@ pip install cluster-experiments ## ⚑ Quick Example -Here's a simple example showing how to analyze an experiment with two metrics: a simple metric (conversions) and a ratio metric (conversion rate). +Here's a simple example showing how to analyze an experiment with multiple metrics organized by category - a common production pattern: ```python import pandas as pd import numpy as np -from cluster_experiments import AnalysisPlan +from cluster_experiments import ( + AnalysisPlan, SimpleMetric, RatioMetric, + Variant, HypothesisTest +) # Simulate experiment data np.random.seed(42) -n_users = 1000 +n_users = 5000 data = pd.DataFrame({ 'user_id': range(n_users), 'variant': np.random.choice(['control', 'treatment'], n_users), - 'orders': np.random.poisson(2.5, n_users), # Number of orders (simple metric) - 'visits': np.random.poisson(10, n_users), # Number of visits (for ratio) + 'orders': np.random.poisson(2.5, n_users), # Number of orders + 'visits': np.random.poisson(10, n_users), # Number of visits }) -# Add a small treatment effect to orders -data.loc[data['variant'] == 'treatment', 'orders'] += np.random.poisson(0.5, (data['variant'] == 'treatment').sum()) - -# Calculate conversions (users who ordered) -data['converted'] = (data['orders'] > 0).astype(int) - -# Define analysis plan -analysis_plan = AnalysisPlan.from_metrics_dict({ - 'metrics': [ - # Simple metric: total conversions - { - 'alias': 'conversions', - 'name': 'converted', - 'metric_type': 'simple' - }, - # Ratio metric: conversion rate (conversions / visits) - { - 'alias': 'conversion_rate', - 'metric_type': 'ratio', - 'numerator': 'converted', - 'denominator': 'visits' - }, - ], - 'variants': [ - {'name': 'control', 'is_control': True}, - {'name': 'treatment', 'is_control': False}, - ], - 'variant_col': 'variant', - 'analysis_type': 'ols', # Use OLS for simple A/B test -}) +# Add treatment effect: +20% orders for treatment +data.loc[data['variant'] == 'treatment', 'orders'] += \ + np.random.poisson(0.5, (data['variant'] == 'treatment').sum()) -# Run analysis -results = analysis_plan.analyze(data) +# Prepare data +data['converted'] = data['orders'].astype(int) -# View results as a dataframe -print(results.to_dataframe()) +# Define metrics by type and category +absolute_metrics = { + "orders": "revenue" # metric_name: category +} + +ratio_metrics = { + "conversion_rate": { + "category": "conversion", + "components": ["converted", "visits"] # [numerator, denominator] + } +} + +# Define variants +variants = [ + Variant("control", is_control=True), + Variant("treatment", is_control=False) +] + +# Build hypothesis tests from metric definitions +hypothesis_tests = [] + +# 1. Ratio metrics: use delta method for proper ratio analysis +for metric_name, config in ratio_metrics.items(): + metric = RatioMetric( + alias=f"{config['category']}__{metric_name}", + numerator_name=config['components'][0], + denominator_name=config['components'][1] + ) + hypothesis_tests.append( + HypothesisTest( + metric=metric, + analysis_type="delta", + analysis_config={ + "scale_col": metric.denominator_name, + "cluster_cols": ["user_id"] + } + ) + ) + +# 2. Absolute metrics: use standard OLS +for metric_name, category in absolute_metrics.items(): + metric = SimpleMetric( + alias=f"{category}__{metric_name}", + name=metric_name + ) + hypothesis_tests.append( + HypothesisTest( + metric=metric, + analysis_type="ols" + ) + ) + +# Create and run analysis plan +analysis_plan = AnalysisPlan( + tests=hypothesis_tests, + variants=variants, + variant_col='variant' +) + +results = analysis_plan.analyze(data, verbose=True) +results_df = results.to_dataframe() +print(results_df) ``` -**Output**: A comprehensive scorecard with treatment effects, confidence intervals, and p-values for each metric: +**Output**: A comprehensive scorecard with treatment effects, confidence intervals, and p-values: ``` - metric control_mean treatment_mean ... p_value ci_lower ci_upper -0 conversions 0.485 0.532 ... 0.023 0.006 0.088 -1 conversion_rate 0.048 0.053 ... 0.031 0.0004 0.009 + metric_alias control treatment ate p_value ... + conversion__conversion_rate 0.250 0.303 +20.9% < 0.001 ... + revenue__orders 2.510 3.005 +19.7% < 0.001 ... ``` -This simple example demonstrates: -- βœ… Working with both **simple** and **ratio metrics** -- βœ… Easy experiment setup with **dictionary-based configuration** -- βœ… Statistical inference with **confidence intervals and p-values** -- βœ… **Automatic scorecard generation** for multiple metrics +This example demonstrates: +- βœ… **Organized metric definitions** - Group metrics by type and category +- βœ… **Multiple analysis methods** - Delta method for ratios, OLS for totals +- βœ… **Scalable** - Easy to add more metrics by updating dictionaries --- @@ -157,7 +191,7 @@ For detailed guides, API references, and advanced examples, visit our [**documen --- -## 🎯 Core Concepts +## Core Concepts The library is built around three main components: @@ -189,7 +223,7 @@ For **power analysis**, combine these with: --- -## πŸ” When to Use cluster-experiments +## When to Use cluster-experiments βœ… **Use cluster-experiments when you need to:** - Design and analyze **cluster-randomized experiments** @@ -200,82 +234,118 @@ For **power analysis**, combine these with: - Analyze **multiple metrics** with dimensional slicing - Work with **ratio metrics** (rates, averages, etc.) -πŸ“Š **Perfect for:** -- Marketplace/platform experiments (drivers, restaurants, stores) + **Perfect for:** + - A/B tests +- Marketplace/platform experiments (drivers, restaurants, stores,...) - Geographic experiments (cities, regions) - Time-based tests (switchbacks, dayparting) -- Standard A/B tests with advanced analysis needs --- ## πŸ› οΈ Advanced Features -### Variance Reduction with CUPAC +### Variance Reduction (CUPED/CUPAC) -Reduce variance by leveraging pre-experiment data: +Reduce variance and detect smaller effects by leveraging pre-experiment data. Use historical metrics as covariates to control for pre-existing differences between groups. -```python -from cluster_experiments import AnalysisPlan, TargetAggregation, HypothesisTest, SimpleMetric, Variant +**Use cases:** +- Have pre-experiment metrics for your users/clusters +- Want to detect smaller treatment effects +- Need more sensitive tests with same sample size -# Define CUPAC model -cupac_model = TargetAggregation( - agg_col="customer_id", - target_col="order_value" -) +See the [CUPAC Example](https://david26694.github.io/cluster-experiments/cupac_example.html) for detailed implementation. -# Create hypothesis test with CUPAC -test = HypothesisTest( - metric=SimpleMetric(alias="revenue", name="order_value"), - analysis_type="clustered_ols", - analysis_config={ - "cluster_cols": ["customer_id"], - "covariates": ["customer_age", "estimate_order_value"], - }, - cupac_config={ - "cupac_model": cupac_model, - "target_col": "order_value", - }, -) +### Cluster Randomization -plan = AnalysisPlan( - tests=[test], - variants=[Variant("control", is_control=True), Variant("treatment")], - variant_col="variant", -) +Handle experiments where randomization occurs at group level (stores, cities, regions) rather than individual level. Essential for managing spillover effects and operational constraints. -# Analyze with pre-experiment data -results = plan.analyze(experiment_df, pre_experiment_df) -``` +See the [Cluster Randomization Guide](https://david26694.github.io/cluster-experiments/examples/cluster_randomization.html) for details. + +### Switchback Experiments + +Design and analyze time-based crossover experiments where the same units receive both control and treatment at different times. + +See the [Switchback Example](https://david26694.github.io/cluster-experiments/switchback.html) for implementation. + +--- -### Power Analysis +## Power Analysis -Estimate the power of your experiment design: +Design your experiment by estimating required sample size and detectable effects. Here's a complete example using **analytical (CLT-based) power analysis**: ```python -from cluster_experiments import PowerAnalysis, NormalPowerAnalysis -from cluster_experiments import ClusteredSplitter, ConstantPerturbator, ClusteredOLSAnalysis - -# Simulation-based power analysis -power_sim = PowerAnalysis( - splitter=ClusteredSplitter(cluster_cols=['city']), - perturbator=ConstantPerturbator(average_effect=0.1), - analysis=ClusteredOLSAnalysis(cluster_cols=['city']), - n_simulations=1000 -) +import numpy as np +import pandas as pd +from cluster_experiments import NormalPowerAnalysis -power = power_sim.power_analysis(historical_data, average_effect=0.1) -print(f"Estimated power: {power:.2%}") +# Create sample historical data +np.random.seed(42) +N = 500 -# Analytical power analysis (faster for standard designs) -power_analytical = NormalPowerAnalysis.from_dict({ - 'cluster_cols': ['city'], - 'analysis': 'clustered_ols' +historical_data = pd.DataFrame({ + 'user_id': range(N), + 'metric': np.random.normal(100, 20, N), + 'date': pd.to_datetime('2025-10-01') + pd.to_timedelta(np.random.randint(0, 30, N), unit='d') }) -mde = power_analytical.mde(historical_data, power=0.8) -print(f"Minimum Detectable Effect at 80% power: {mde:.4f}") +# Initialize analytical power analysis (fast, CLT-based) +power_analysis = NormalPowerAnalysis.from_dict({ + 'analysis': 'ols', + 'splitter': 'non_clustered', + 'target_col': 'metric', + 'time_col': 'date' # Required for mde_time_line +}) + +# 1. Calculate power for a given effect size +power = power_analysis.power_analysis(historical_data, average_effect=5.0) +print(f"Power for detecting +5 unit effect: {power:.1%}") + +# 2. Calculate Minimum Detectable Effect (MDE) for desired power +mde = power_analysis.mde(historical_data, power=0.8) +print(f"Minimum detectable effect at 80% power: {mde:.2f}") + +# 3. Power curve: How power changes with effect size +power_curve = power_analysis.power_line( + historical_data, + average_effects=[2.0, 4.0, 6.0, 8.0, 10.0] +) + +# 4. MDE timeline: How MDE changes with experiment length +mde_timeline = power_analysis.mde_time_line( + historical_data, + powers=[0.8], + experiment_length=[7, 14, 21, 30] +) +``` + +**Output:** +``` +Power for detecting +5 unit effect: 81.1% +Minimum detectable effect at 80% power: 4.93 + +Power Curve: + effect power + 2.0 20.6% + 4.0 62.2% + 6.0 92.6% + 8.0 99.5% + 10.0 100.0% + +MDE Timeline (experiment length β†’ MDE): + 7 days: 10.64 + 14 days: 7.62 + 21 days: 6.14 + 30 days: 4.93 ``` +**Key methods:** +- `power_analysis()`: Calculate power for a given effect +- `mde()`: Calculate minimum detectable effect +- `power_line()`: Generate power curves across effect sizes +- `mde_time_line()`: Calculate MDE for different experiment lengths + +For simulation-based power analysis (for complex designs), see the [Power Analysis Guide](https://david26694.github.io/cluster-experiments/power_analysis_guide.html). + --- ## 🀝 Contributing diff --git a/docs/examples/simple_ab_test.ipynb b/docs/examples/simple_ab_test.ipynb index e7b4c6a7..34eb42a6 100644 --- a/docs/examples/simple_ab_test.ipynb +++ b/docs/examples/simple_ab_test.ipynb @@ -20,9 +20,21 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": {}, - "outputs": [], + "outputs": [ + { + "ename": "ModuleNotFoundError", + "evalue": "No module named 'pandas'", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[0;32mIn[1], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mpandas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mpd\u001b[39;00m\n\u001b[1;32m 2\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mnumpy\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mnp\u001b[39;00m\n\u001b[1;32m 3\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mcluster_experiments\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m AnalysisPlan\n", + "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'pandas'" + ] + } + ], "source": [ "import pandas as pd\n", "import numpy as np\n", @@ -106,8 +118,8 @@ " {\n", " 'alias': 'conversion_rate', \n", " 'metric_type': 'ratio',\n", - " 'numerator': 'converted',\n", - " 'denominator': 'visits'\n", + " 'numerator_name': 'converted',\n", + " 'denominator_name': 'visits'\n", " },\n", " # Simple metric: total revenue\n", " {\n", @@ -174,8 +186,22 @@ } ], "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, "language_info": { - "name": "python" + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" } }, "nbformat": 4, diff --git a/docs/power_analysis_guide.md b/docs/power_analysis_guide.md deleted file mode 100644 index 190b4a35..00000000 --- a/docs/power_analysis_guide.md +++ /dev/null @@ -1,349 +0,0 @@ -# Power Analysis Guide - -This guide explains how to design experiments using **power analysis** to determine sample sizes and experiment duration. - ---- - -## What is Power Analysis? - -**Power analysis** helps you answer questions like: -- How many users do I need to detect a 5% lift? -- How long should I run my experiment? -- What's the smallest effect I can reliably detect? - -This is done **before** running your experiment, using historical data to simulate different scenarios. - ---- - -## When to Use Power Analysis - -Use power analysis when: -- βœ… **Planning an experiment**: Determine required sample size -- βœ… **Evaluating feasibility**: Check if an effect is detectable with available data -- βœ… **Optimizing duration**: Balance statistical power with business needs - -Don't need power analysis when: -- ❌ Analyzing completed experiments (use `AnalysisPlan` instead) -- ❌ You have unlimited sample size (though you should still check!) - ---- - -## Two Approaches - -### 1. Normal Approximation (Recommended) - -**Pros:** Fast, uses analytical formulas -**Cons:** Assumes normal distribution (works well for large samples) - -```python -from cluster_experiments import NormalPowerAnalysis - -power_analysis = NormalPowerAnalysis.from_dict({ - 'analysis': 'ols', - 'splitter': 'non_clustered', -}) - -# Calculate MDE for 80% power -mde = power_analysis.mde(historical_data, power=0.8) -print(f"Minimum Detectable Effect: {mde:.2%}") - -# Calculate power for specific effect size -power = power_analysis.power_analysis(historical_data, average_effect=0.05) -print(f"Power for 5% effect: {power:.1%}") -``` - -### 2. Simulation-Based - -**Pros:** Works for any distribution, more accurate for complex designs -**Cons:** Slower (runs many simulations) - -```python -from cluster_experiments import PowerAnalysis -from cluster_experiments import ClusteredSplitter, ConstantPerturbator, ClusteredOLSAnalysis - -# Define components -splitter = ClusteredSplitter(cluster_cols=['store_id']) -perturbator = ConstantPerturbator() # Simulates treatment effect -analysis = ClusteredOLSAnalysis(cluster_cols=['store_id']) - -# Create power analysis -power_analysis = PowerAnalysis( - splitter=splitter, - perturbator=perturbator, - analysis=analysis, - n_simulations=1000 # Number of simulations -) - -# Run power analysis -power = power_analysis.power_analysis(historical_data, average_effect=0.1) -``` - ---- - -## Understanding Components (Simulation-Based) - -For simulation-based power analysis, you need three components: - -### 1. Splitter: How to Randomize - -The **Splitter** defines how to divide your data into control and treatment groups. - -#### Available Splitters - -| Splitter | Use Case | Example | -|----------|----------|---------| -| `NonClusteredSplitter` | Individual-level randomization | User-level A/B test | -| `ClusteredSplitter` | Cluster-level randomization | Store-level test | -| `SwitchbackSplitter` | Time-based alternating treatment | Daily switchback | -| `StratifiedClusteredSplitter` | Balanced cluster randomization | Stratified by region | - -#### Example - -```python -from cluster_experiments import ClusteredSplitter - -splitter = ClusteredSplitter( - cluster_cols=['store_id'], # Randomize at store level -) -``` - ---- - -### 2. Perturbator: Simulating Treatment Effect - -The **Perturbator** simulates the treatment effect on your historical data. This lets you test "what if we had run an experiment with X% lift?" - -#### Available Perturbators - -| Perturbator | Effect Type | Example | -|-------------|-------------|---------| -| `ConstantPerturbator` | Absolute increase | +$5 revenue | -| `RelativePositivePerturbator` | Percentage increase | +10% revenue | -| `BinaryPerturbator` | Binary outcome shift | +5% conversion | -| `NormalPerturbator` | Normally distributed | Variable effect | - -#### Example - -```python -from cluster_experiments import ConstantPerturbator - -perturbator = ConstantPerturbator( - average_effect=5.0 # Add $5 to treatment group -) - -# Or for relative effects -from cluster_experiments import RelativePositivePerturbator - -perturbator = RelativePositivePerturbator( - average_effect=0.10 # 10% increase -) -``` - ---- - -### 3. Analysis: Measuring Impact - -The **Analysis** component specifies which statistical method to use for measuring the treatment effect. - -#### Available Analysis Methods - -| Analysis | Use Case | -|----------|----------| -| `OLSAnalysis` | Standard A/B test | -| `ClusteredOLSAnalysis` | Cluster randomization with clustered SE | -| `TTestClusteredAnalysis` | T-test on cluster-aggregated data | -| `GeeExperimentAnalysis` | Correlated observations (GEE) | -| `SyntheticControlAnalysis` | Observational studies | - -#### Example - -```python -from cluster_experiments import ClusteredOLSAnalysis - -analysis = ClusteredOLSAnalysis( - cluster_cols=['store_id'], # Cluster standard errors -) -``` - ---- - -## Complete Example: Store-Level Experiment - -Let's design a store-level promotional experiment: - -```python -import pandas as pd -import numpy as np -from cluster_experiments import PowerAnalysis -from cluster_experiments import ClusteredSplitter, RelativePositivePerturbator, ClusteredOLSAnalysis - -# Historical data: daily store sales -np.random.seed(42) -n_stores = 50 -days = 30 - -historical_data = [] -for store_id in range(n_stores): - for day in range(days): - historical_data.append({ - 'store_id': store_id, - 'day': day, - 'revenue': np.random.gamma(shape=100, scale=5) + np.random.normal(0, 50) - }) - -df = pd.DataFrame(historical_data) - -# Define power analysis components -splitter = ClusteredSplitter(cluster_cols=['store_id']) -perturbator = RelativePositivePerturbator() # % increase -analysis = ClusteredOLSAnalysis(cluster_cols=['store_id']) - -power_analysis = PowerAnalysis( - splitter=splitter, - perturbator=perturbator, - analysis=analysis, - target_col='revenue', - n_simulations=500 -) - -# Question 1: What power do we have for 10% lift? -power = power_analysis.power_analysis(df, average_effect=0.10) -print(f"Power for 10% lift: {power:.1%}") - -# Question 2: How does power change with effect size? -power_curve = power_analysis.power_line( - df, - average_effects=[0.05, 0.10, 0.15, 0.20] -) -print("\nPower Curve:") -print(power_curve) -``` - ---- - -## Power Curves and Timelines - -### Power Curve (Effect Size) - -See how power changes with different effect sizes: - -```python -power_curve = power_analysis.power_line( - df, - average_effects=[0.03, 0.05, 0.07, 0.10, 0.15] -) -``` - -### MDE Timeline (Experiment Duration) - -See how MDE changes with experiment length: - -```python -from cluster_experiments import NormalPowerAnalysis - -npw = NormalPowerAnalysis.from_dict({ - 'analysis': 'clustered_ols', - 'cluster_cols': ['store_id'], - 'time_col': 'date', -}) - -mde_timeline = npw.mde_time_line( - df, - powers=[0.8], # 80% power - experiment_length=[7, 14, 21, 30] # days -) -``` - ---- - -## Dictionary Configuration - -For simpler setups, use dictionary configuration: - -```python -from cluster_experiments import PowerAnalysis - -config = { - 'splitter': 'clustered', - 'cluster_cols': ['store_id'], - 'perturbator': 'relative_positive', - 'analysis': 'clustered_ols', - 'n_simulations': 500, -} - -power_analysis = PowerAnalysis.from_dict(config) -``` - ---- - -## Tips and Best Practices - -### 1. Use Historical Data - -- Use real historical data that matches your experiment setup -- More data = more reliable power estimates -- Ensure your historical period is representative - -### 2. Match Components to Design - -- If experiment is cluster-randomized, use `ClusteredSplitter` and `ClusteredOLSAnalysis` -- If individual-level, use `NonClusteredSplitter` and `OLSAnalysis` -- Match perturbator to expected effect type (absolute vs relative) - -### 3. Simulation Count - -- More simulations = more accurate but slower -- Start with 100-500 for exploration -- Use 1000+ for final estimates - -### 4. Power Standards - -- **80% power** is standard (80% chance of detecting effect if it exists) -- **Higher power** requires larger sample size or longer duration -- Consider business tradeoffs (speed vs certainty) - ---- - -## Common Questions - -### Q: What's the difference between power analysis and experiment analysis? - -**Power analysis** (before experiment): -- Uses historical data -- Simulates different scenarios -- Answers: "How much data do I need?" - -**Experiment analysis** (after experiment): -- Uses actual experiment data -- Measures real treatment effects -- Answers: "What was the impact?" - -### Q: When should I use simulation vs normal approximation? - -**Normal approximation:** -- βœ… Fast results -- βœ… Standard experimental designs -- βœ… Large sample sizes - -**Simulation:** -- βœ… Complex designs (switchback, stratified) -- βœ… Non-normal distributions -- βœ… Small sample sizes - -### Q: My power is too low, what can I do? - -Options to increase power: -1. **Increase sample size** (more users; this can be achieved by running the experiment longer) -3. **Use variance reduction** (CUPAC/CUPED) -4. **Detect larger effects** (focus on bigger changes) -5. **Use more sensitive metrics** - ---- - -## Next Steps - -- **[Normal Power Example](normal_power.html)** - Compare simulation vs normal approximation -- **[Power Lines Example](normal_power_lines.html)** - Visualize power curves -- **[Switchback Power](switchback.html)** - Power analysis for switchback designs -- **[API Reference](api/power_analysis.html)** - Detailed power analysis documentation - diff --git a/docs/quickstart.md b/docs/quickstart.md index f4b0ae81..edb77553 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -13,7 +13,7 @@ pip install cluster-experiments ``` !!! info "Requirements" - - **Python 3.9 or higher** + - **Python 3.8 or higher** - Main dependencies: `pandas`, `numpy`, `scipy`, `statsmodels` --- @@ -25,11 +25,14 @@ Let's analyze a simple A/B test with multiple metrics. This is the most common u ```python import pandas as pd import numpy as np -from cluster_experiments import AnalysisPlan +from cluster_experiments import ( + AnalysisPlan, SimpleMetric, RatioMetric, + Variant, HypothesisTest +) # Simulate experiment data np.random.seed(42) -n_users = 1000 +n_users = 5000 data = pd.DataFrame({ 'user_id': range(n_users), @@ -42,22 +45,64 @@ data = pd.DataFrame({ data.loc[data['variant'] == 'treatment', 'orders'] += np.random.poisson(0.5, (data['variant'] == 'treatment').sum()) data['converted'] = (data['orders'] > 0).astype(int) -# Define analysis plan -analysis_plan = AnalysisPlan.from_metrics_dict({ - 'metrics': [ - # Simple metric - {'alias': 'conversions', 'name': 'converted', 'metric_type': 'simple'}, - # Ratio metric - {'alias': 'conversion_rate', 'metric_type': 'ratio', - 'numerator': 'converted', 'denominator': 'visits'}, - ], - 'variants': [ - {'name': 'control', 'is_control': True}, - {'name': 'treatment', 'is_control': False}, - ], - 'variant_col': 'variant', - 'analysis_type': 'ols', -}) +# Define metrics by type and category +absolute_metrics = { + "orders": "revenue" # metric_name: category +} + +ratio_metrics = { + "conversion_rate": { + "category": "conversion", + "components": ["converted", "visits"] # [numerator, denominator] + } +} + +# Define variants +variants = [ + Variant("control", is_control=True), + Variant("treatment", is_control=False) +] + +# Build hypothesis tests from metric definitions +hypothesis_tests = [] + +# 1. Ratio metrics: use delta method for proper ratio analysis +for metric_name, config in ratio_metrics.items(): + metric = RatioMetric( + alias=f"{config['category']}__{metric_name}", + numerator_name=config['components'][0], + denominator_name=config['components'][1] + ) + hypothesis_tests.append( + HypothesisTest( + metric=metric, + analysis_type="delta", + analysis_config={ + "scale_col": metric.denominator_name, + "cluster_cols": ["user_id"] + } + ) + ) + +# 2. Absolute metrics: use standard OLS +for metric_name, category in absolute_metrics.items(): + metric = SimpleMetric( + alias=f"{category}__{metric_name}", + name=metric_name + ) + hypothesis_tests.append( + HypothesisTest( + metric=metric, + analysis_type="ols" + ) + ) + +# Create and run analysis plan +analysis_plan = AnalysisPlan( + tests=hypothesis_tests, + variants=variants, + variant_col='variant' +) # Run analysis results = analysis_plan.analyze(data) @@ -80,11 +125,9 @@ The results dataframe includes: | `ate` | Average Treatment Effect (absolute difference) | | `ate_ci_lower/upper` | 95% confidence interval for ATE | | `p_value` | Statistical significance (< 0.05 = significant) | -| `relative_effect` | Percentage change (lift) | !!! tip "Interpreting Results" - **p_value < 0.05**: Result is statistically significant - - **relative_effect**: Shows % change (e.g., 0.10 = 10% increase) - **Confidence interval**: If it doesn't include 0, effect is significant --- @@ -111,20 +154,32 @@ results = analysis_plan.analyze(experiment_data) **Example:** Calculate power or Minimum Detectable Effect (MDE). ```python +import numpy as np +import pandas as pd from cluster_experiments import NormalPowerAnalysis +# Create historical data +np.random.seed(42) +historical_data = pd.DataFrame({ + 'user_id': range(500), + 'metric': np.random.normal(100, 20, 500), + 'date': pd.to_datetime('2025-10-01') + pd.to_timedelta(np.random.randint(0, 30, 500), unit='d') +}) + # Define your analysis setup power_analysis = NormalPowerAnalysis.from_dict({ 'analysis': 'ols', 'splitter': 'non_clustered', + 'target_col': 'metric', + 'time_col': 'date' }) # Calculate MDE for 80% power mde = power_analysis.mde(historical_data, power=0.8) -print(f"Need {mde:.2%} effect size for 80% power") +print(f"Need {mde:.2f} effect size for 80% power") # Or calculate power for a given effect size -power = power_analysis.power_analysis(historical_data, average_effect=0.05) +power = power_analysis.power_analysis(historical_data, average_effect=5.0) print(f"Power: {power:.1%}") ``` @@ -141,9 +196,32 @@ print(f"Power: {power:.1%}") **Example:** ```python +import pandas as pd +import numpy as np +from cluster_experiments import AnalysisPlan + +# Simulate store-level experiment data +np.random.seed(42) +n_stores = 50 +transactions_per_store = 100 + +data = [] +for store_id in range(n_stores): + variant = np.random.choice(['control', 'treatment']) + n_trans = np.random.poisson(transactions_per_store) + + store_data = pd.DataFrame({ + 'store_id': store_id, + 'variant': variant, + 'purchase_amount': np.random.normal(50, 20, n_trans) + }) + data.append(store_data) + +experiment_data = pd.concat(data, ignore_index=True) + # Use clustered_ols for cluster-randomized experiments analysis_plan = AnalysisPlan.from_metrics_dict({ - 'metrics': [{'alias': 'revenue', 'name': 'purchase_amount'}], + 'metrics': [{'alias': 'revenue', 'name': 'purchase_amount', 'metric_type': 'simple'}], 'variants': [ {'name': 'control', 'is_control': True}, {'name': 'treatment', 'is_control': False}, @@ -154,6 +232,9 @@ analysis_plan = AnalysisPlan.from_metrics_dict({ 'cluster_cols': ['store_id'] # ← Specify clustering variable } }) + +results = analysis_plan.analyze(experiment_data) +print(results.to_dataframe()) ``` **Learn more:** See [Cluster Randomization Example](examples/cluster_randomization.html). @@ -169,7 +250,29 @@ analysis_plan = AnalysisPlan.from_metrics_dict({ **Example:** ```python -from cluster_experiments import TargetAggregation, HypothesisTest, SimpleMetric, Variant +import pandas as pd +import numpy as np +from cluster_experiments import ( + AnalysisPlan, TargetAggregation, HypothesisTest, + SimpleMetric, Variant +) + +# Simulate experiment data +np.random.seed(42) +n_customers = 1000 + +experiment_data = pd.DataFrame({ + 'customer_id': range(n_customers), + 'variant': np.random.choice(['control', 'treatment'], n_customers), + 'order_value': np.random.normal(100, 20, n_customers), + 'customer_age': np.random.randint(20, 60, n_customers), +}) + +# Simulate pre-experiment data (historical) +pre_experiment_data = pd.DataFrame({ + 'customer_id': range(n_customers), + 'order_value': np.random.normal(95, 25, n_customers), # Historical order values +}) # Define CUPAC model using pre-experiment data cupac_model = TargetAggregation( @@ -193,12 +296,13 @@ test = HypothesisTest( plan = AnalysisPlan( tests=[test], - variants=[Variant("control", is_control=True), Variant("treatment")], + variants=[Variant("control", is_control=True), Variant("treatment", is_control=False)], variant_col="variant", ) # Analyze with both experiment and pre-experiment data results = plan.analyze(experiment_data, pre_experiment_data) +print(results.to_dataframe()) ``` **Learn more:** See [CUPAC Example](cupac_example.html). @@ -207,15 +311,15 @@ results = plan.analyze(experiment_data, pre_experiment_data) ## Ratio Metrics -`cluster-experiments` has built-in support for ratio metrics (e.g., conversion rate, average order value). +`cluster-experiments` has built-in support for ratio metrics (e.g., conversion rate, average order value), as seen in the first example: ```python # Ratio metric: conversions / visits { 'alias': 'conversion_rate', 'metric_type': 'ratio', - 'numerator': 'converted', # Numerator column - 'denominator': 'visits' # Denominator column + 'numerator_name': 'converted', # Numerator column + 'denominator_name': 'visits' # Denominator column } ``` diff --git a/mkdocs.yml b/mkdocs.yml index 49784f27..89fc1c59 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -11,7 +11,7 @@ site_dir: site nav: - Home: index.md - Quickstart: quickstart.md - - Power Analysis Guide: power_analysis_guide.md + - Power Analysis: normal_power_lines.ipynb - API Reference: - Experiment Analysis: From 55ac747d37ab5f6f23d24a9fd0f215195484f46f Mon Sep 17 00:00:00 2001 From: luizhsuperti Date: Sat, 1 Nov 2025 16:47:15 +0100 Subject: [PATCH 7/9] update simple examples --- docs/examples/cluster_randomization.ipynb | 167 ++++++++- docs/examples/simple_ab_test.ipynb | 406 +++++++++++++++++++--- 2 files changed, 509 insertions(+), 64 deletions(-) diff --git a/docs/examples/cluster_randomization.ipynb b/docs/examples/cluster_randomization.ipynb index 3ca950cb..21b71313 100644 --- a/docs/examples/cluster_randomization.ipynb +++ b/docs/examples/cluster_randomization.ipynb @@ -25,7 +25,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 10, "metadata": {}, "outputs": [], "source": [ @@ -48,9 +48,95 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 11, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Total transactions: 5,055\n", + "Stores in control: 23\n", + "Stores in treatment: 27\n", + "\n", + "First few rows:\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
store_idvariantpurchase_amount
00control83.479541
10control78.039264
20control65.286167
30control63.589803
40control94.543677
\n", + "
" + ], + "text/plain": [ + " store_id variant purchase_amount\n", + "0 0 control 83.479541\n", + "1 0 control 78.039264\n", + "2 0 control 65.286167\n", + "3 0 control 63.589803\n", + "4 0 control 94.543677" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# Define parameters\n", "n_stores = 50 # Number of stores (clusters)\n", @@ -109,9 +195,21 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 12, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "=== Naive Analysis (Ignoring Clusters) ===\n", + "Treatment Effect: $4.26\n", + "Standard Error: $0.63\n", + "P-value: 0.0000\n", + "95% CI: [$3.03, $5.48]\n" + ] + } + ], "source": [ "# Naive analysis without clustering\n", "naive_plan = AnalysisPlan.from_metrics_dict({\n", @@ -133,7 +231,7 @@ "naive_results = naive_plan.analyze(data).to_dataframe()\n", "print(\"=== Naive Analysis (Ignoring Clusters) ===\")\n", "print(f\"Treatment Effect: ${naive_results.iloc[0]['ate']:.2f}\")\n", - "print(f\"Standard Error: ${naive_results.iloc[0]['ate_se']:.2f}\")\n", + "print(f\"Standard Error: ${naive_results.iloc[0]['std_error']:.2f}\")\n", "print(f\"P-value: {naive_results.iloc[0]['p_value']:.4f}\")\n", "print(f\"95% CI: [${naive_results.iloc[0]['ate_ci_lower']:.2f}, ${naive_results.iloc[0]['ate_ci_upper']:.2f}]\")\n" ] @@ -149,9 +247,21 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 13, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "=== Correct Analysis (With Clustering) ===\n", + "Treatment Effect: $4.26\n", + "Standard Error: $3.04\n", + "P-value: 0.1610\n", + "95% CI: [$-1.70, $10.21]\n" + ] + } + ], "source": [ "# Correct analysis with clustered standard errors\n", "clustered_plan = AnalysisPlan.from_metrics_dict({\n", @@ -176,7 +286,7 @@ "clustered_results = clustered_plan.analyze(data).to_dataframe()\n", "print(\"=== Correct Analysis (With Clustering) ===\")\n", "print(f\"Treatment Effect: ${clustered_results.iloc[0]['ate']:.2f}\")\n", - "print(f\"Standard Error: ${clustered_results.iloc[0]['ate_se']:.2f}\")\n", + "print(f\"Standard Error: ${clustered_results.iloc[0]['std_error']:.2f}\")\n", "print(f\"P-value: {clustered_results.iloc[0]['p_value']:.4f}\")\n", "print(f\"95% CI: [${clustered_results.iloc[0]['ate_ci_lower']:.2f}, ${clustered_results.iloc[0]['ate_ci_upper']:.2f}]\")\n" ] @@ -192,9 +302,24 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 14, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "=== Comparison ===\n", + " Method Treatment Effect Standard Error P-value 95% CI\n", + " Naive (OLS) $4.26 $0.63 0.0000 [$3.03, $5.48]\n", + "Correct (Clustered OLS) $4.26 $3.04 0.1610 [$-1.70, $10.21]\n", + "\n", + "Notice: The clustered standard errors are LARGER, reflecting the\n", + "additional uncertainty from intra-cluster correlation.\n" + ] + } + ], "source": [ "comparison = pd.DataFrame({\n", " 'Method': ['Naive (OLS)', 'Correct (Clustered OLS)'],\n", @@ -203,8 +328,8 @@ " f\"${clustered_results.iloc[0]['ate']:.2f}\"\n", " ],\n", " 'Standard Error': [\n", - " f\"${naive_results.iloc[0]['ate_se']:.2f}\",\n", - " f\"${clustered_results.iloc[0]['ate_se']:.2f}\"\n", + " f\"${naive_results.iloc[0]['std_error']:.2f}\",\n", + " f\"${clustered_results.iloc[0]['std_error']:.2f}\"\n", " ],\n", " 'P-value': [\n", " f\"{naive_results.iloc[0]['p_value']:.4f}\",\n", @@ -247,8 +372,22 @@ } ], "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, "language_info": { - "name": "python" + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" } }, "nbformat": 4, diff --git a/docs/examples/simple_ab_test.ipynb b/docs/examples/simple_ab_test.ipynb index 34eb42a6..61877c71 100644 --- a/docs/examples/simple_ab_test.ipynb +++ b/docs/examples/simple_ab_test.ipynb @@ -20,21 +20,9 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 13, "metadata": {}, - "outputs": [ - { - "ename": "ModuleNotFoundError", - "evalue": "No module named 'pandas'", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)", - "Cell \u001b[0;32mIn[1], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mpandas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mpd\u001b[39;00m\n\u001b[1;32m 2\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mnumpy\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mas\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mnp\u001b[39;00m\n\u001b[1;32m 3\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;21;01mcluster_experiments\u001b[39;00m\u001b[38;5;250m \u001b[39m\u001b[38;5;28;01mimport\u001b[39;00m AnalysisPlan\n", - "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'pandas'" - ] - } - ], + "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", @@ -55,9 +43,150 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 14, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Dataset shape: (2000, 5)\n", + "\n", + "First few rows:\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
user_idvariantvisitsconvertedrevenue
00control7190.366149
11treatment1400.000000
22control1300.000000
33control700.000000
44control1600.000000
55treatment700.000000
66control1500.000000
77control1200.000000
88control1600.000000
99treatment800.000000
\n", + "
" + ], + "text/plain": [ + " user_id variant visits converted revenue\n", + "0 0 control 7 1 90.366149\n", + "1 1 treatment 14 0 0.000000\n", + "2 2 control 13 0 0.000000\n", + "3 3 control 7 0 0.000000\n", + "4 4 control 16 0 0.000000\n", + "5 5 treatment 7 0 0.000000\n", + "6 6 control 15 0 0.000000\n", + "7 7 control 12 0 0.000000\n", + "8 8 control 16 0 0.000000\n", + "9 9 treatment 8 0 0.000000" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "n_users = 2000\n", "\n", @@ -102,40 +231,79 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 15, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Analysis plan created successfully!\n" + ] + } + ], "source": [ - "analysis_plan = AnalysisPlan.from_metrics_dict({\n", - " 'metrics': [\n", - " # Simple metric: total conversions\n", - " {\n", - " 'alias': 'conversions',\n", - " 'name': 'converted',\n", - " 'metric_type': 'simple'\n", - " },\n", - " # Ratio metric: conversion rate\n", - " {\n", - " 'alias': 'conversion_rate', \n", - " 'metric_type': 'ratio',\n", - " 'numerator_name': 'converted',\n", - " 'denominator_name': 'visits'\n", - " },\n", - " # Simple metric: total revenue\n", - " {\n", - " 'alias': 'revenue',\n", - " 'name': 'revenue',\n", - " 'metric_type': 'simple'\n", - " },\n", - " ],\n", - " 'variants': [\n", - " {'name': 'control', 'is_control': True},\n", - " {'name': 'treatment', 'is_control': False},\n", - " ],\n", - " 'variant_col': 'variant',\n", - " 'analysis_type': 'ols', # Use OLS for simple A/B test\n", - " 'alpha': 0.05, # 95% confidence level\n", - "})\n", + "from cluster_experiments import (\n", + " AnalysisPlan, SimpleMetric, RatioMetric, \n", + " Variant, HypothesisTest\n", + ")\n", + "\n", + "# Define metrics by type\n", + "simple_metrics = {\n", + " \"conversions\": \"converted\", # alias: column_name\n", + " \"revenue\": \"revenue\"\n", + "}\n", + "\n", + "ratio_metrics = {\n", + " \"conversion_rate\": {\n", + " \"numerator\": \"converted\",\n", + " \"denominator\": \"visits\"\n", + " }\n", + "}\n", + "\n", + "# Define variants\n", + "variants = [\n", + " Variant(\"control\", is_control=True),\n", + " Variant(\"treatment\", is_control=False)\n", + "]\n", + "\n", + "# Build hypothesis tests\n", + "hypothesis_tests = []\n", + "\n", + "# Ratio metrics: use delta method\n", + "for alias, config in ratio_metrics.items():\n", + " metric = RatioMetric(\n", + " alias=alias,\n", + " numerator_name=config[\"numerator\"],\n", + " denominator_name=config[\"denominator\"]\n", + " )\n", + " hypothesis_tests.append(\n", + " HypothesisTest(\n", + " metric=metric,\n", + " analysis_type=\"delta\",\n", + " analysis_config={\n", + " \"scale_col\": metric.denominator_name,\n", + " \"cluster_cols\": [\"user_id\"]\n", + " }\n", + " )\n", + " )\n", + "\n", + "# Simple metrics: use OLS\n", + "for alias, column_name in simple_metrics.items():\n", + " metric = SimpleMetric(alias=alias, name=column_name)\n", + " hypothesis_tests.append(\n", + " HypothesisTest(\n", + " metric=metric,\n", + " analysis_type=\"ols\"\n", + " )\n", + " )\n", + "\n", + "# Create analysis plan\n", + "analysis_plan = AnalysisPlan(\n", + " tests=hypothesis_tests,\n", + " variants=variants,\n", + " variant_col='variant'\n", + ")\n", "\n", "print(\"Analysis plan created successfully!\")\n" ] @@ -151,9 +319,147 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 16, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "=== Experiment Results ===\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/luiz.henrique/Documents/GitHub/cluster-experiments/.venv/lib/python3.9/site-packages/cluster_experiments/experiment_analysis.py:1671: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.\n", + " return df.groupby(self.treatment_col).apply(\n", + "/Users/luiz.henrique/Documents/GitHub/cluster-experiments/.venv/lib/python3.9/site-packages/cluster_experiments/experiment_analysis.py:1676: UserWarning: Delta Method approximation may not be accurate for small group sizes\n", + " warnings.warn(\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
metric_aliascontrol_variant_nametreatment_variant_namecontrol_variant_meantreatment_variant_meananalysis_typeateate_ci_lowerate_ci_upperp_valuestd_errordimension_namedimension_valuealpha
0conversion_ratecontroltreatment0.0099720.011912delta0.001940-0.0008250.0047060.1690060.001411__total_dimensiontotal0.05
1conversionscontroltreatment0.1003940.117886ols0.017492-0.0098740.0448590.2102850.013963__total_dimensiontotal0.05
2revenuecontroltreatment5.4515157.359327ols1.907812-0.1304883.9461120.0665811.039968__total_dimensiontotal0.05
\n", + "
" + ], + "text/plain": [ + " metric_alias control_variant_name treatment_variant_name \\\n", + "0 conversion_rate control treatment \n", + "1 conversions control treatment \n", + "2 revenue control treatment \n", + "\n", + " control_variant_mean treatment_variant_mean analysis_type ate \\\n", + "0 0.009972 0.011912 delta 0.001940 \n", + "1 0.100394 0.117886 ols 0.017492 \n", + "2 5.451515 7.359327 ols 1.907812 \n", + "\n", + " ate_ci_lower ate_ci_upper p_value std_error dimension_name \\\n", + "0 -0.000825 0.004706 0.169006 0.001411 __total_dimension \n", + "1 -0.009874 0.044859 0.210285 0.013963 __total_dimension \n", + "2 -0.130488 3.946112 0.066581 1.039968 __total_dimension \n", + "\n", + " dimension_value alpha \n", + "0 total 0.05 \n", + "1 total 0.05 \n", + "2 total 0.05 " + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# Run analysis\n", "results = analysis_plan.analyze(data)\n", From 0a4af8b844bf68f5ada2c1019076421a141b4bd9 Mon Sep 17 00:00:00 2001 From: luizhsuperti Date: Sat, 1 Nov 2025 16:48:41 +0100 Subject: [PATCH 8/9] rollback gitignore --- .gitignore | 2 -- 1 file changed, 2 deletions(-) diff --git a/.gitignore b/.gitignore index 208e6e22..e40cd467 100644 --- a/.gitignore +++ b/.gitignore @@ -171,5 +171,3 @@ todos.txt # experiments/ cluster-experiments.code-workspace -QUICKSTART_RESTRUCTURE.md -DOCUMENTATION_REVAMP_SUMMARY.md From 601ec18efb30620b1fe1dfdef05066dfd7fbba56 Mon Sep 17 00:00:00 2001 From: luizhsuperti Date: Sat, 22 Nov 2025 16:10:34 +0100 Subject: [PATCH 9/9] quickstart-readme-structure revamped Reformulate the home and quickstart, as well as the examples and API reference (Examples will probably need to be evaluated case by case) --- README.md | 307 ++++++++--------------- docs/license.md | 21 ++ docs/quick_start_power_curve.png | Bin 0 -> 26909 bytes docs/quickstart.md | 409 +++++++++++-------------------- mkdocs.yml | 54 ++-- 5 files changed, 298 insertions(+), 493 deletions(-) create mode 100644 docs/license.md create mode 100644 docs/quick_start_power_curve.png diff --git a/README.md b/README.md index 7fd4f312..64b061ad 100644 --- a/README.md +++ b/README.md @@ -9,6 +9,7 @@ ![License](https://img.shields.io/github/license/david26694/cluster-experiments) [![Pypi version](https://img.shields.io/pypi/pyversions/cluster-experiments.svg)](https://pypi.python.org/pypi/cluster-experiments) + **`cluster-experiments`** is a comprehensive Python library for **end-to-end A/B testing workflows**, from experiment design to statistical analysis. ## πŸ“– What is cluster-experiments? @@ -34,7 +35,7 @@ While the library is aimed at these scenarios, it's equally capable of handling - Simulation-based (Monte Carlo) for any design complexity - Analytical, (CLT-based) for standard designs - Minimal Detectable Effect (MDE) estimation - + - **Multiple Experimental Designs** - Standard A/B tests with individual randomization - Cluster-randomized experiments @@ -73,200 +74,51 @@ pip install cluster-experiments ## ⚑ Quick Example -Here's a simple example showing how to analyze an experiment with multiple metrics organized by category - a common production pattern: +Here's how to run an analysis in just a few lines: ```python import pandas as pd import numpy as np -from cluster_experiments import ( - AnalysisPlan, SimpleMetric, RatioMetric, - Variant, HypothesisTest -) +from cluster_experiments import AnalysisPlan, Variant -# Simulate experiment data np.random.seed(42) -n_users = 5000 -data = pd.DataFrame({ - 'user_id': range(n_users), - 'variant': np.random.choice(['control', 'treatment'], n_users), - 'orders': np.random.poisson(2.5, n_users), # Number of orders - 'visits': np.random.poisson(10, n_users), # Number of visits +# 0. Create simple data +N = 1_000 +df = pd.DataFrame({ + "variant": np.random.choice(["control", "treatment"], N), + "orders": np.random.poisson(10, N), + "visits": np.random.poisson(100, N), +}) +df["converted"] = (df["orders"] > 0).astype(int) + + +# 1. Define your analysis plan +plan = AnalysisPlan.from_metrics_dict({ + "metrics": [ + {"name": "orders", "alias": "revenue", "metric_type": "simple"}, + {"name": "converted", "alias": "conversion", "metric_type": "ratio", "numerator": "converted", "denominator": "visits"} + ], + "variants": [ + {"name": "control", "is_control": True}, + {"name": "treatment", "is_control": False} + ], + "variant_col": "variant", + "analysis_type": "ols" }) -# Add treatment effect: +20% orders for treatment -data.loc[data['variant'] == 'treatment', 'orders'] += \ - np.random.poisson(0.5, (data['variant'] == 'treatment').sum()) - -# Prepare data -data['converted'] = data['orders'].astype(int) - -# Define metrics by type and category -absolute_metrics = { - "orders": "revenue" # metric_name: category -} - -ratio_metrics = { - "conversion_rate": { - "category": "conversion", - "components": ["converted", "visits"] # [numerator, denominator] - } -} - -# Define variants -variants = [ - Variant("control", is_control=True), - Variant("treatment", is_control=False) -] - -# Build hypothesis tests from metric definitions -hypothesis_tests = [] - -# 1. Ratio metrics: use delta method for proper ratio analysis -for metric_name, config in ratio_metrics.items(): - metric = RatioMetric( - alias=f"{config['category']}__{metric_name}", - numerator_name=config['components'][0], - denominator_name=config['components'][1] - ) - hypothesis_tests.append( - HypothesisTest( - metric=metric, - analysis_type="delta", - analysis_config={ - "scale_col": metric.denominator_name, - "cluster_cols": ["user_id"] - } - ) - ) - -# 2. Absolute metrics: use standard OLS -for metric_name, category in absolute_metrics.items(): - metric = SimpleMetric( - alias=f"{category}__{metric_name}", - name=metric_name - ) - hypothesis_tests.append( - HypothesisTest( - metric=metric, - analysis_type="ols" - ) - ) - -# Create and run analysis plan -analysis_plan = AnalysisPlan( - tests=hypothesis_tests, - variants=variants, - variant_col='variant' -) - -results = analysis_plan.analyze(data, verbose=True) -results_df = results.to_dataframe() -print(results_df) +# 2. Run analysis on your dataframe +results = plan.analyze(df) +print(results.to_dataframe().head()) ``` -**Output**: A comprehensive scorecard with treatment effects, confidence intervals, and p-values: - +**Output Example**: ``` - metric_alias control treatment ate p_value ... - conversion__conversion_rate 0.250 0.303 +20.9% < 0.001 ... - revenue__orders 2.510 3.005 +19.7% < 0.001 ... + metric_alias control_variant_name treatment_variant_name control_variant_mean treatment_variant_mean analysis_type ate ate_ci_lower ate_ci_upper p_value std_error dimension_name dimension_value alpha +0 revenue control treatment 10.08554 9.941061 ols -1.444788e-01 -5.446603e-01 2.557026e-01 0.479186 2.041780e-01 __total_dimension total 0.05 +1 conversion control treatment 1.00000 1.000000 ols 1.110223e-16 -1.096504e-16 3.316950e-16 0.324097 1.125902e-16 __total_dimension total 0.05 ``` -This example demonstrates: -- βœ… **Organized metric definitions** - Group metrics by type and category -- βœ… **Multiple analysis methods** - Delta method for ratios, OLS for totals -- βœ… **Scalable** - Easy to add more metrics by updating dictionaries - ---- - -## πŸ“š Documentation - -For detailed guides, API references, and advanced examples, visit our [**documentation**](https://david26694.github.io/cluster-experiments/). - -### Key Resources -- [**Quickstart Guide**](https://david26694.github.io/cluster-experiments/quickstart.html): Get up and running in minutes -- [**API Reference**](https://david26694.github.io/cluster-experiments/api/experiment_analysis.html): Detailed class and method documentation -- [**Example Gallery**](https://david26694.github.io/cluster-experiments/cupac_example.html): Real-world use cases and patterns - ---- - -## Core Concepts - -The library is built around three main components: - -### 1. **Splitter** - Define how to randomize -Choose how to split your data into control and treatment groups: -- `NonClusteredSplitter`: Standard individual-level randomization -- `ClusteredSplitter`: Cluster-level randomization -- `SwitchbackSplitter`: Time-based alternating treatments -- `StratifiedClusteredSplitter`: Balance randomization across strata - -### 2. **Analysis** - Measure the impact -Select the appropriate statistical method for your design: -- `OLSAnalysis`: Standard regression for A/B tests -- `ClusteredOLSAnalysis`: Clustered standard errors for cluster-randomized designs -- `TTestClusteredAnalysis`: T-tests on cluster-aggregated data -- `GeeExperimentAnalysis`: GEE for correlated observations -- `SyntheticControlAnalysis`: Observational studies with synthetic controls - -### 3. **AnalysisPlan** - Orchestrate your analysis -Define your complete analysis workflow: -- Specify metrics (simple and ratio) -- Define variants and dimensions -- Configure hypothesis tests -- Generate comprehensive scorecards - -For **power analysis**, combine these with: -- **Perturbator**: Simulate treatment effects for power calculations -- **PowerAnalysis**: Estimate statistical power and sample sizes - ---- - -## When to Use cluster-experiments - -βœ… **Use cluster-experiments when you need to:** -- Design and analyze **cluster-randomized experiments** -- Handle **switchback/crossover designs** -- Account for **network effects or spillover** -- Perform **power analysis** for complex designs -- Reduce variance with **CUPED/CUPAC** -- Analyze **multiple metrics** with dimensional slicing -- Work with **ratio metrics** (rates, averages, etc.) - - **Perfect for:** - - A/B tests -- Marketplace/platform experiments (drivers, restaurants, stores,...) -- Geographic experiments (cities, regions) -- Time-based tests (switchbacks, dayparting) - ---- - -## πŸ› οΈ Advanced Features - -### Variance Reduction (CUPED/CUPAC) - -Reduce variance and detect smaller effects by leveraging pre-experiment data. Use historical metrics as covariates to control for pre-existing differences between groups. - -**Use cases:** -- Have pre-experiment metrics for your users/clusters -- Want to detect smaller treatment effects -- Need more sensitive tests with same sample size - -See the [CUPAC Example](https://david26694.github.io/cluster-experiments/cupac_example.html) for detailed implementation. - -### Cluster Randomization - -Handle experiments where randomization occurs at group level (stores, cities, regions) rather than individual level. Essential for managing spillover effects and operational constraints. - -See the [Cluster Randomization Guide](https://david26694.github.io/cluster-experiments/examples/cluster_randomization.html) for details. - -### Switchback Experiments - -Design and analyze time-based crossover experiments where the same units receive both control and treatment at different times. - -See the [Switchback Example](https://david26694.github.io/cluster-experiments/switchback.html) for implementation. - --- ## Power Analysis @@ -309,6 +161,9 @@ power_curve = power_analysis.power_line( historical_data, average_effects=[2.0, 4.0, 6.0, 8.0, 10.0] ) +print(power_curve) +# Tip: You can plot this using matplotlib: +# plt.plot(power_curve['average_effect'], power_curve['power']) # 4. MDE timeline: How MDE changes with experiment length mde_timeline = power_analysis.mde_time_line( @@ -320,22 +175,9 @@ mde_timeline = power_analysis.mde_time_line( **Output:** ``` -Power for detecting +5 unit effect: 81.1% -Minimum detectable effect at 80% power: 4.93 - -Power Curve: - effect power - 2.0 20.6% - 4.0 62.2% - 6.0 92.6% - 8.0 99.5% - 10.0 100.0% - -MDE Timeline (experiment length β†’ MDE): - 7 days: 10.64 - 14 days: 7.62 - 21 days: 6.14 - 30 days: 4.93 +Power for detecting +5 unit effect: 72.7% +Minimum detectable effect at 80% power: 5.46 +{2.0: 0.17658708766689768, 4.0: 0.5367343456559069, 6.0: 0.8682558423423066, 8.0: 0.983992856563122, 10.0: 0.9992385426477484} ``` **Key methods:** @@ -348,19 +190,74 @@ For simulation-based power analysis (for complex designs), see the [Power Analys --- -## 🀝 Contributing +## πŸ“š Documentation + +For detailed guides, API references, and advanced examples, visit our [**documentation**](https://david26694.github.io/cluster-experiments/). + +### Core Concepts + +The library is built around three main components: + +#### 1. **Splitter** - Define how to randomize + +Choose how to split your data into control and treatment groups: + +- `NonClusteredSplitter`: Standard individual-level randomization +- `ClusteredSplitter`: Cluster-level randomization +- `SwitchbackSplitter`: Time-based alternating treatments +- `StratifiedClusteredSplitter`: Balance randomization across strata + +#### 2. **Analysis** - Measure the impact + +Select the appropriate statistical method for your design: + +- `OLSAnalysis`: Standard regression for A/B tests +- `ClusteredOLSAnalysis`: Clustered standard errors for cluster-randomized designs +- `TTestClusteredAnalysis`: T-tests on cluster-aggregated data +- `GeeExperimentAnalysis`: GEE for correlated observations +- `SyntheticControlAnalysis`: Observational studies with synthetic controls + +#### 3. **AnalysisPlan** - Orchestrate your analysis -We welcome contributions! See our [Contributing Guidelines](CONTRIBUTING.md) for details on how to: -- Report bugs -- Suggest features -- Submit pull requests -- Write documentation +Define your complete analysis workflow: + +- Specify metrics (simple and ratio) +- Define variants and dimensions +- Configure hypothesis tests +- Generate comprehensive scorecards + +For **power analysis**, combine these with: + +- **Perturbator**: Simulate treatment effects for power calculations +- **PowerAnalysis**: Estimate statistical power and sample sizes --- -## πŸ“„ License +## πŸ› οΈ Advanced Features + +### Variance Reduction (CUPED/CUPAC) + +Reduce variance and detect smaller effects by leveraging pre-experiment data. Use historical metrics as covariates to control for pre-existing differences between groups. + +**Use cases:** -This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. +- Have pre-experiment metrics for your users/clusters +- Want to detect smaller treatment effects +- Need more sensitive tests with same sample size + +See the [CUPAC Example](https://david26694.github.io/cluster-experiments/cupac_example.html) for detailed implementation. + +### Cluster Randomization + +Handle experiments where randomization occurs at group level (stores, cities, regions) rather than individual level. Essential for managing spillover effects and operational constraints. + +See the [Cluster Randomization Guide](https://david26694.github.io/cluster-experiments/examples/cluster_randomization.html) for details. + +### Switchback Experiments + +Design and analyze time-based crossover experiments where the same units receive both control and treatment at different times. + +See the [Switchback Example](https://david26694.github.io/cluster-experiments/switchback.html) for implementation. --- diff --git a/docs/license.md b/docs/license.md new file mode 100644 index 00000000..1731b8a3 --- /dev/null +++ b/docs/license.md @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2022 David Masip + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/docs/quick_start_power_curve.png b/docs/quick_start_power_curve.png new file mode 100644 index 0000000000000000000000000000000000000000..b8df592e108e34e6098870c9ea85541ceb35b3ad GIT binary patch literal 26909 zcmbrmby$?$+6Ow6g@mG@AYmX#zDjq9A|Tx@Eiiqz_yl;^ zo_ct=x{LGkJN^3(J{LDTep~jMH{dO2T^|{`!(cQP(EmvaWb*A{u&1w;6y$WCr{bq5 zJWMd-ZGSk@Kdr5;@yR~pUNgy5rbwWCA)4?@gU{<_q;6(d1ARiIplANHmbVLIN;hhH zO6}2jwd%y<-tQ-<;gasuq{q`^0!qE^$kZ#%*mVXlCX6j(^iYHZ`meBZa_EPGjR*Rt zEXf(LOc+c9_T?hY6z#X8H~2|))6}TOA&*W?RE|b^HxubR zRpjb1;e@Iixo7z`BJ((vM&_{pDD^V8-lZ}3@lEvaFO*!$+b{f@`~1xB-McqwRaI5x zIb!Nv=zp-i(x3IJ?Hy|wf77l!NBZ;Z1Nda%$(;Z)#l8_<)MU0* zv{)OaAMNLTw3%@-G+F1qvnM&hevLv}#fQHqMMQtGHx1Jq7^s>qIflDP@-yfc%?lA& zNIpHU(bL~QKi8qEDk}y3w&7Bn-2(fYeKLe$W0yO(Z$F%|JlMwXuQjX>+;o@AlTdIcCKt;Jbc1uoqpowZ)u zAm*OwI`{naz+)fP3~B$2gZpEP8G&m}(|+q{=hK6}52M9bI71_!AU!S8{kr0L*KnZ> zW(DIlj*HG!W@)9S4Mnx+<9!^Lc~fH;HFMGKZu80I47biZD?f6=acREznvqH5!?%og zCJkM*5Joi{PE{?wD~0(hJtEkl`!rH>hE6b7w)3rIw7T%xSi>3z-8CeC^WgwqVS<)- z!UuWX@MV^-R|q!R*JZj)Nlk}!kL_Hd@Z)PB|MKWbqV4p~y?^HV&GgI6 z&_hTkW6$M*2j0^@`NB@)AJ_Nj`OQ{+qgMkce{M_j8&z}ec%JgrqSk39l#Djs#@Tdm zv$H?gnRI{GD>@rR8a{Jbu!Pt3Y-8J(rszg9pZUu>hNFHFD@-npS@FEa4zq1`A z&BeZ$kU8G95(z}YbmfDeASN=|B>_G-+GooVP!EI z);FexrjYbw9&Q8bB(doyvwl3f*eM_Euf|hgG2 zRHzTLUlq*4X&acr+=TOHJ)AF&imux1AEqXU^@N_3s=g|8Ah3l>>Tu?ua+Nf3&q;hJ zB}4rydU_N5=E0D*dR^A1VWeJQo!k86`to}7N&3OHmt^$x)MTsf@bw2jy+>ARUQr5r zU>s_&0x8_K88eNm{^h@&n;Eu`NrEh{umnX|ODG{X8J;M1JB4$XS6vc)BqL&@#>? zEIdrGHlXP^rXS$!o^?7cUQXx#&$ij6uj$z7k;cB79oK(S6fa+CvqRt7HZy(~f_ApT zKN~i*n-8F0mFns4_Gn!qb$2aM{1N;LpKHASZMx5x)(<;UX1+dM+(Ijev#EGunbqbi za!xBtp2GX=E-(#KS)}ZxD{2ObBNeD*A;)^Ndz6(4CT_1Mknpj4Bi#0ArI$NB()Srm zJ>EAst*7|+swD6sURC^L-1V7vn#T zM;`SWx#-Eu2OprTcd!dLjlNgz3_|~_dMTM(R5A{w!(E-FExAwq1t|MGP(@^&pf73n*&v4BN2|Y zvX^)&?9nqC!#1H*#cVWj0kw`wQ^v8Ew^R70!xaajS5JGzz@Y3V(9IjT03 zpAoPU93OhA!Qe>+OB#qgBNX(0rX<46?BzN3q?8P!y;)a9T{90HKDYa}lz8E4?rj^L zvy^B@&N@jf3>Vv&4-O2t`PIXpW)fTt464O>|1nVic1)isI@Hb@g;!ws>U;lO*LeGv z-XIdP4GJfmr{idO*2FThFBpA7A51EDhc+tU@K5{w(u0K*7cVJUq=1XW5xt(w!Kwg* z0ng#6-Ks1c(uH@Im8* zM)Q~QIwij<>|I0L{l57JQO z*67BU*awdK2PR`X^HR?(2@~sagz-}Wr)jnu&ajDJh=(cMqEtpcuB}MR%gw!|TV^JO zjW)>S$kEDEK6ugWRCReN3jfqlJ(#%;nZ}=DZY}k|D`e7Z+_ooq-iwcHGYfturswsj zpIJ!91ox8?#d%ryXe!lIm8^yaXE`%o)+Mas=8+FYu+Ec5$vwsu%qoR}F#Z$MDzUI|<(%J#v zO{4OwoIZog6Q{+QR!&o=7+c|=NObv7c_u`*FdIcEJC3CKfVaf> zoys&!XLLl9*3W5p*3zQm%R55-AZQS)frNuDLCd1-@$FP?2yi)au_4V&z+xT@^80^{^ zN=aCRk?^}#uivCyZl_1BpE2n%h%!1UesS{TGZy{4aBRRIS;4G~+pB1w_?-o7z%HB` zyWmI(_o^`CpyU(|L6pNee)vYi^GCdnH_TS;jMani<)`$1a-XO4%1`r3_KR-mMNk>v z^DFC)PoPkk_!TQbt>d&PRqpQK-T>F6VA{4;T-)&~c*jpoAI4i?5OcoK z&Z{Dw4!2jSaqUd+a7v=cKZdU5W(%eS>JUdS zxJlhZ78KlHMb73hmVH)uvE3_~ht}|C1v8&SVL{JPN5hRI<{o7FWeH{GE6#Q=K`IDu zANKt~3U9}H@i-~jr|i1yOk87T+P9Uc4Yg^Vg4>m5vXENc{{<3fDQ|_T=rx8g-fK@b zE#ZmT_^0Yg_rCE8IioMU^iid#$h7tx#g&hIL@|8!(s4RHt#=GO*z9C)mFh4m9Tlu5 zY?RnOVZuduD^x7}1AcP-9YK=ZmzJhY??H86s}pK!VPSnVx*oA_JM(>7+e%7!yU9Sb za%V(ZMEGm>_&``s(mk8bTCW;0LHg3|^hldXK7*y|mZwsW=|$nv?`xU~{i8+jVVsV} z6n(AJ-tXh1gNB`3>yD_MufbXtX6?tLzfN!aS*%x~+Zt<(6Dt-g%I(k}=m(sh7g2mb zhbOR5@dw9`7r$9m|7B^Z9F=gFlw-_I4yP_IcOshJv?#gin5(W z2Xbtru{8xAj`e!(M8TI56B}hx_`fJA2DhOvrcj&){m5lY@rhI&Q;w<&^ zU#~eP8$Y((o1)i%_eQFE^)(!Byo^Usm+Uh<`P@&RUH@mDmu`ZJZ#}c@PIJ=RBQE=)W`+Pw$3b9Mce{40~B_)n|haq^OuywaWU&=QGw= z2GY$e4WQJdlw6}L;q=uTl`uP@@qP`*k(P2PN}C6trE$YZ`k=L@1LvIvo0;jbeu@Rm z)dSNi;l7C~YtIR*4ZgJ}JUU~aVG=OQ$d3m6nN*wA*DGI6tHMNMuACg(Zjr)b!U?Ag z*S}6L=99qKw0=~P!$jF6DqY;4lyH3Y)q%kx=z;KCNXC9k8b8f;!8RSF#-4)(J!<<$ z5~|!eVJhKd;P-!4^a`7#G+ws85_LIg?iTBG^zx-ddfW613!bxOhNaB z$7Nm6YiMs6p`7t*wTP!QDlfl2rMk$#uxQUKozX{u(kNiR_@n9bgM(8HcqWr-7uSFP z!xWF3(C&Nb8<)++LlHJTmO^*VMWX<7@%2;e&usr!k=s;!<07^H@sGEq+i%6bZ(XB* z4hnK`+kiJ)(om$C#b>GY=Uo7YK*4B4>QwSIacb15NpL#Nc>F8V0pFcjJ&Du za-I=LH%VA`K&8ZW=Z=#1q!gI|p$yo7^`i$~-d6m&g1{*Xu_88@AzH->2RE`)1uy2X1bx83#ww&6V!w z3}4OprcUoW2OH@sX%%b;@?0JU0L90|#6*{rDP-t%YF*qg;xYpR2fyKm8uPFJw=zC7 zKQQ7~HTu+N>Sbf=n14`zfNZd~S!_;ALN8V{HM6G7Ya2a8dX0^=xZCW_(QGhM-n~oT zNqt;7bb@hXNT9<7@fjB zuhWTXS6^zxqSa%@M26;?kG*$0^y2wwGN~?{H~JDye+$vApW3D?D#hQqb*yY6rl}|S)lYRG}pv*-ER1H?cAFe+uFr(s(K0gmoIgGxxq757YERw zl{WqHU)0;4ogeSiFC;rQ@=*aB9qrW?cs9nDmoQe~OodLm#0no)p}WF|Y&J`1l(Sh+ zPi#a>)Q#Zp8aCZu${Pz9B^d91XEGv1^S}J3%6naczAN*DSV%lz)BIk>JX-clZ zou*O$gHtJ!wACEl#{dy~t05y>IC zeNVrKt%Ocpc&It=&G9=2NLRU>-O;WD;jo%DOOcGv^Bq$@|2sAz%9&Dkku#dZB`XIVbj^Qn_#CXO*vChFtS+ z6qdpHyNE|2+to3#eXBp8Uh&PqxfAEt{X4K^bY}z_7-)9=HUif)YDmKnVnSRM?}cs-3a|7Z zmarzVuW9Qs%i*;*4BtAkt1c?dVbnWsH!mCnV zZDC3iBFpB>}~GSH`uO$;uQv?zRspq)v0_vmOD?}{>99G| zpxZk4d)oZi6#ZxJF=|hZLW9T^&O&2GtON;q_iRrFRa>|AP9A)qv-EVUnv#0NA&khr z_!_)!92!-2h$45-(Yy8mqjW}{QRK)8bIgZdR-Rf_TzqU}BOevK;)0&GnL~c>GJVNZ zlO6Sd1Fs3{rgsA`&fjvd)A@Z^Y>KXC$1SgSu+=tmkOt9K@LU=bi-9LjHP%eHqd3~W zt<>I*o4R%@B~nLR?DL>FNyu&D^f)6&)~O#2M_O;bCk1}N!^8?FS5&HPq@#P4^kUE} zs7n%5j`C<#6KU-Aje$RC2-+RJ;(dIR$;@rTq8eC-D_`2Yyg_@35s!S0;1O#=tJ!h30=oteA%l zzbuzr&l4!SGMs7YEL)_*)dAM0AWXb|7Nii1r*b<2oqDHO=x%x<%s=xE^ykGG+{ zl4?jovuY7=MCIX%rAX|54IGjbx|>oy!gvC6D$=?=2EB;`dK0G}4w>&bSt7J1`G_vT z`2+T3$3JZ)9!nQfRBU8OdJ*PAoOeI4o2)Wh#194my*r*t$3a6`YGNW^*Sd!>I%xP= z{x0&M7eq;L3a@?dAE_Ft-()+gc$(w3L7wYX(^Y5F)c4fXRH*Ah4D9a?ao(E{aNFwr zI=}C*ZFB@&$qS#0S`$V#?%|`D(wJZNX)ui|km0y0AaiY2QpOw!V^O;MPBW+GcSq)@ zN9q^Wf;{vb!CwBp51DN^+|ek$DXFRrVY!0~rt0h+d^J?*do>N?MWg;ae1D-{$y~&Ayx4x}zSV8Yd-QrRphEVxkO^GN@-(Z3g31hMtM(i;jZ(pDY@A zv*czhmg(DWV@V1%4WD#XxNqNX1Ad@|5*jx9 zd+FnhfMk?fMxP0S_1Gle)UavtU`6lon84T4uw;D>HWDZa4?2^OZbM>^HvZPTa3jy= zm^o%e=TV0gjaC<<&6!8AnA602s2(+UW0_-`QmXVMHpz31=2dEgJiKs>4!W^H9NVLL)TeI=o}DpED_)Oa7r%pvI(Lkv7CY`dn5 zTR0w!8y-n~;rEUCrJV;%Q$8_G>1aJ;Z=3mP{W^~i#+jdeW5kEAvysSZUI!LVn_cM- zB3lJTE{0JAbR=@YUUS7&vgA7O-SRBbmwN`r=5&L`mgiy1;~Se>VDp#qdn=6Z?nakc zb||C)7f$6_s4hzy^aS$vKktHCg3d(NDDPR-@KdYwAe}7hY~SjlPgTbH-XwRV`G&z> zuq8vD?>8r#s!@Fh@8;2bx6pX`&6_(-ZgPI#zw+a-y9Dwr!5mJW~h*44zlJ<8J`<0CzXKJwfH513wkMd$?J@?at zJRY|yD@TF7mF0xitnu|Qfbx;Cn_rBci)tUb%tNbD&B+jxcDm3n2e^6I84GB=Z@~1- zE5Khn7T6GzmE1c`TJ$kSI-XrGSSNBneE2mhiMR~Y5QT1yW&_U89z<^@+XNA|xW`e0 zcl<~Po~d$ii@gLpG6OZ9TdtqV9Vt7R`EZnxy!rLbn)yn@j{==z)a|p^)z2A0C%H6OfU}40<(bkQ9V)-UhPKn zi(S{_5GwxF`^v+zs#95M4toctaO1e143@ocs~jrszS=Z=G_%dA7b+>K9ZmFS5$U7I z)!l!w;J|Bk;w@XLi&KLe8QRqpX8nyP$DyRMxwHIIFt%S%eMb`@ytvNZsLfI2J7!G< z@>Z~?rJ*?FdipCts~Y-}iT4Noub)+jH`&{ux=}q3Ds}NN*pFAu`>>!08s(Abq#Gw@ z6TG6P0=91)KB^8ENiND&26>poDz7P2}57On>>0Am~-8FB|A_Hs9(<(tZGoXyq@KK`nPhK8s9AvSd2cr$ruJxi1D1DVfonI z+r64OLJB)01NKMuYmt_C-6Pi1un0cTvV|#>bi=VE>!{Gyf93vf5Vv8nyP0-pVYKfd zYku`mwHJ-vxf(}`0zgx6FA5~vz~X72t-eqiiApMYY#y!VVyMe+GTlpHo3}xkuVFr? zfv1Cl|2a%K54>4lr#pJhOdd^Oy})alZBIvOOg zK&S>3JTA=k4B6E0%qDFd z@-Uz|eb1}OY}X>I;iylW5w{S_ME4N}dk_rXd}bnZYl$JyVYEzT(g~+pzpxu-mOufE zV1r~h+R^a`XP(hLFgj$jw*6`mse{eI`Iyqf*uFqpX7Mdyc7%Doum7~R-uH(SQqL?L zk8<3Z%eqKm3T)7Z!L9e)Jh~(XxAvki&dcj|d``tcHe2;PFxikB(9Go@BGpA2#dNW~ zhQ%SZSEK?B{nCEa*N@ncz;0ZDCcoijv+sdfQUWoFN;&EsrWh^0^3_(ik&hBaI|Bl1 zPz>_sO=Gg@#DXFA6rCH_)mrzx2JXJ)B2MxgG|B6ZxKbX1pMUTtzKLutr9Zw$oxLk2 zWx>!G+2qg|AAn=iS)VCPMmwZ*#y{#WI5r(ZdWjL2aRFM!T**XwsadO*P#~TfN~wR4 z)fYW@dygomdyt$S8P?b&_)@E2K2S;M);#{uM!j0}P*>RxlGHh9tFK$_eG)X4_Gr|D zPv<4iD=+oPnd+h}m55{9fyTJSW7K?KD*Tf}57GluoS0qls=ls;92TSxWe#A`yEs zo*QuUwu9e8q%r-Ap*T|=VfN3sN{~cAFIPVV`N*lL)Pm~A8TCWOnKMV?g&z0hfMCQS z0bA}2HhR|kZduz{FBbUaDElJR74vRw@Pq3{yM5#!Q8Mo{%VC;M&F0B4=MX8*KG|N) zoC^_!miB}dl1{~*$3EB8Mz7$W+o5=sr;yanyKS(S1h4?uxw+-Qf~ctTQ|^6I5Yhyd zoEL6BT-~X_1{@%usu#kY63heYZ2QtpOKM1A(cKNP@wf$4eto8g#X+*~@x5RmzG|O+ z^OX*cjG+?4DnYloM^r4527pA5 z12pNU7Au;Mwo@2|9C87x#@usd=m!EdT|~f*IL>?tO&_9$aW-Z(^9(mjk4Z6RVt){} zMkx@Nn3!fg4Pmg5wbZ;i_eCxVi3zG=%B9hY9Ec+^KR3rIyuE%+ztlvGA?mhz`YOxG zOgL+iPN5c`C^J_^A4eI5!c8=D_j7M0ovZ%E2(WAFJ z%C~AGuQ0|jzir!oq8LXWDU}R&Vgn8d}P0jmYBR+Z z3$oJq@f!EN8*{$#@Fq=5OUovwmlO2S8{>^WRZL#qCr3NQP7_F990KLBGv$o~R0^$c zZ!Xg;F9IIbeh`(`HJ)0)q7bZFKeF0#^fOHn8D*&>@u-p0RzRIe}f|`BlQd9VF)__q&)XwMAtV>13 zrQ*36jxK7PjdEFfD!2ZMsMOU7KwHgg2iBjalglTk0(KAeR;Cc8@~X8+m)gvKgSx-I zl=w>-|IBj06qUArU*M%MSbtBE2YMKDekHtg(?I(sCxp-UIhPsL6d5?4f~}e(`4v%U zktqsxWL#{Lty-@yvQKLLq>=m`WZRvXwcd2loAwc@ zo*`X_e!RTAytbd0U{(Y=;rgtBhZTVN#4p0r=NrdY`;)Y+^E7Prefs=hjn!>jv7zI~ zD-6X6eqQ}`PL9uk*Mcc_SBa^#(c71o520MlQI~e#icA*_s!8-rjQ}2+)bZ|YJs`JD zo%50KUi0Aoi7>2cBPfr(hSPXHI}MZl0VU_UIk#~mp-dkyu@;Z)dkp+h<%0YnY=$+y z#M}OR%=NWO6u@)l8~i)i6KW|z764c88we7)ZU1FJRaq%&@OM+1r_Oz9OArde^C3T- z4l16absU|L{~mpu+M5xkqlN;jsSTp)y@$dHbw?ZX)fJ%^!N(?O)p@Opg7ty=HKn@4 zkgFr+v$9!;iTG0PusnyEdS*~KDp{TO(j9cjLjzYClnKSsb2jtwrYm9|jq%QpwQKHm zFiI2@GwSipLncd31!m!@5$H5o$Q|~nzD1|_#aXmaL5hR^qd$-@3x*=KXeLOI8ZxV$ z{lXY^xR!lC7A5e`5T9lSPnVm!>p5wISTkNLpsiXQKfLtPR431)5(-8dJdnF39jpB5 z|L}4Du&JSgZdKGn!g|p;b~@q&EurFK)#}%PYW)V`_4xbu2mn_uzvXTyfWIb(G$PXg ziZy$ONY_`VqEmBC4W&IDHRM1<)GNGoYzhh?nv)|KtS%t6K~4W%Qc3sfHRh4rN;5o0 zll|f>FMI;Ld}j)9bZ>Mw>QWYX(J` zY2R^98bA`D8N$@T_-vhMDT}1FHTieFxi(m*G!W+=8$X|pWRZxpIr+Fc4 zoO8U}HPv*oO22|dH$hUTtprl1rGP8_ZjedM^5#IqyhZwL9S+xLI}#Hawp?gUUu-}k zGtn?eSl-gD3|&jF&qeU9YIH%0b;bt>CF04|G3lwTVI5E>A%0m|ZDn+g;{fVL+_!Sj z+6g)n2aHV!7+by%E)Fs_U$6cOdhA%Ge7ntFp}BWuJTsfP3NL6Ca>DD~IvE$}1l{NL;*_FC zzJo-jNzI4oEO?-0$d_Q#v!+|Mg}i&bbUBHMOo7eYN}>gg0t=EQH>eUa)j!Wv_6_G< zP0;~Me zUlM0LRpg!GWm{xeGr%ZNUd$DzOx(#EkQBaj580$(yuXq=sr2ck7}*FF1v12`C_y;^S-!=sg7Z&yy*6!6_!h1ydajL&TB;*`{W6K{ZY;mcS4%@JJDkJyjtaXC3mdPDuv zX*%~_%gh1`r#!JCVLSgq#12u)B=I1r=~~)ob*0%FMziDWGoq9fK*A*(abVfzFm<@C zW3#x60oamMppPGWTRU+tFT4YLd42U&2e-$;!C>5MLa(F#eRcDHr4+C?WMBsJL!(2@ z9_xx#lY~HzB1AhM907na;x;(|17ssdA5UmWZOof^qt`V(g?3Zp->`#pUZVusWJ~E? z_web|nl_N^@1CkJRuuU5l@?_4oX(JNF9bB3s|QM$yY!CFIG=2OY^&d!Ya<8cswALZ z1<{^We)r-}e%ML<*Quu2t=si$U)zYw(gDi~YIk!x+#lhn5)}4KYLLqGw>R}SAtTOe z1t?F;ePHlw&jZ7x6D#F~$JZ)0jSH0yPeIpVA4CmL!=B&jN|;p>c9iRgCl6d>#e3N2 z8$mCJZ9_$tWXyZ~Ai*ksr9ozd>pI^0veHvO zejhC@r=8%WS(y7A4R#+u8pLe3C4l$}_;^wwWCMN};W*y+nVKMqIN{xhub_tL;BrW3 zzj|LP70h9cgieKZcH8;Yq6JJBF)WIn0V+=ZqlmYGXan%k!Xldwf?iKVgo?B}Ar+Us z4sGD0c&EpMms{H$zb%*+8J31jUmMc(F@|J!v{|n-XK<_Zs-~hR1RhP zUVY2b|KPF;J5=+)M9+iQbKdxWP5E0aAUnUX>;K~v)ip6C`y^@4LVdZ`15d4GYsi(( zM1cVmmw=b@AS`use((KJ@s@kJW;#vh#@$|Mg)JAr3tBGh`ff8${a(0avv(+7XL$Jt zWp|pL1Jc6?axekW#~EtodB)R8w*fL_S5sPQck)9G>?l#nZ_a?drf!AO_+A=YDC>rs zYpa`0J|t${up4Kg%Riu`9F(43Bm5vIf0vF`uBf>C_i?B&r2gD*PLKP+f8sGxCiy9S zO{O8#((;TlzyfZ79l8_@A_b)k&yWDN*BezPk(K2vgT5mNVSJ#O5hPEZ$I+agpAcv? z=X`>-SM_LBdP|I7{ol*bywAnpyMEw7Nvx=?-$`H3xswpfiIrU;pdkduD5#ghSfADM2`{I9hJ=;sgo(uOOOp;y3YfOb83SKSRa z6>_KyFs+qS0#<@h=U~R>5_1UHJ8*;S!1fjoAl22GLM2O(k8ZgS zc1P1i#pu(1bYkW*H?jz{TBd6&BgKb#*NHb+K{q(fMxQ4zO3MY7PJi0CbCyvmv)Bm3 zgM)5hBL_n$AZjKCFdz)jmri6U=Rqc*FNDw=Cz?P6(BTMWZsGj8%&n0&L4azm25Y_Q z%p+zPA+!(^y9#s|7(VJbkL3XXbqOIirPCn)*UV`_8bGTHUJKH+PYzSt=YJlp1|Goeq3qb{U?J&Xwmh_^x5(1cN2m7~IM|I;=IEWT-MiW|r#rOuVv(IltL*-DB z?bWv{Sl!FYw}~E_i1b}>`7db{HAtff|CvU?u15n}U-r@eNTUwC^g3=GZvj{IDiUJ> z5s1OD26s{HIoP*_@H$7&c_unj4e?SX8}xJ;CD1h(xd;y$oT%vR>=aLRp$M}4gXzW) z3u&MMpkpOet*#jNC~6Z5Oti}u%d1s=!(IH%C3z|dTmk}BOU0x?H-G;bimRVN6n1Xx zd3vR+_!a^y`vze*Em>kEe>j^DHzL9Li@bNz`volrCx@GTtv*IDQA-zfgHH`I$#uaresp<;e{<5?pN>4$>z!y-D-ZWi=iORbiA{9VV)NZv8)7ZrfcP~5OquR?a}eE%@QnDR*S>eZ|Di|KwtfaUACK5csQVbtil)nZRd zK3KJh(|Gj{vBlJ!KeO$uYRj-2P_758w~tC$ByA`Hq=2wSN)>hKyScAk7LI!_jnnYp#A?@l<}p`64|sJj#yt2-)dfPgj~n&;BA_i$(0uM8rYH1w3p z9R6-~{uBP?k&4P7fC3iN^KH7~`C$JE1<-eb`g5y=jHbYOs=j{tQ_7@1r4cM!8zi1P zf%lBdjw(lqiD&)s*M~%Zk?r3*(3>Wqo2~HLAGtU;_kD5m@`~6(Z(8x{Djc95(%#14 zM8}j^>Z*^W)k*I&K#1uF2uDe6ZF9o$(a%>j)cGE;3oQVGn_}(lZ-~|VF8-cHs=20g z$*7%ZhaqAmPo;Qu1}t957tfj zH30i>QrI|m20HA}8OQD2zqq?Q`;L%ta;OL$hG^38=)9A^1kP3H)i_|>P!qILDKwaT zbyn{au){N9IDbI6FET^>W-(chSKEj0YmtP|N_4DN6&io6zjy!H-2s_~_`x4Fu|)44 zQV34q%rl|&Zb_*CBO9jVZ~XbCC;LI}S-=TvR zdQuwdls`R)q~8Zrj4iMOOUaIV@(z17|r>Gp*i94Vy*-a$l=>vlgiROyll%e^|ahlGw@73-Ht&;2(9<7U3Ux2?6~&w@LEuketa)Ut}or$B5bV6j$vP{B>%g- z)^yghSPpeBdx7;|)GycWO7v-&MKbVvxhKFBkWR~nD*~3ml*;UMvw}r|BLM3MVL|Yk zUf+iv)yl|<>FW}5t0U%t2(WL{yxqUSNhg*qF}M&Kpk!Fy)5oJ$@9ZINb#C=Lnd9fq z{IL}GaQ5}9m0g0i-B(1m)*)8>W*Yp=PamLmz)H{1rFh%c1Jd6As%l`GzskRSFl_Kb ze0rSp%lFkP?r?8&s=rn(P5kn{7{pFwhzAGCaA6x`ubtE-@PghMp?4|O7|fn$;p6}x zEs@24OB0-*addY6=wFld0(N6?AAfa4N>UeeS@c6cyas_U(rB3TZ{T}#ZF^wGak%7J zm(PUHhth^ManGgx>sU8zs0CUDdH8&q{X5wjszx~7@EhC@j?soK^cL^g#hzhO0Po?zwWwUl*3$_VHhJ zocjN&Q5*^fvJ^Hze>|y-WpUNf#Rudv0hzQaamK5K%b;GB?F&iYCbuwO+Zq! z*^ybs&tO0^;C2`a;v)w|675xB7#lO-6sPN&)(oFBsga*0?t4jOC6rOl4`;2oAA?B(up z`SE#>{Dmz1(+E!fbyL+f>@B2V03JIz{R!BZ8DMxSX^xy>BaKo$QrQU*>@q!3%pdn% z{WAO?z)Mgz6)_M4is2SGjAakFh0ex#%5g(w<}!qJ?1^joMOkvDW86il!@XA zEO@Z0`o*3bGw@|~D0Of;b)>Z#MB)1fuVFWgpaASI$t}Vu_1swnDsduK)7d4KSEmT3hkOGRxDm2CIA|e7KPxrDdSyBg(fHD#p%SmR^Z6#HsaMmU+te` z)x_&H*@?tK&WqiKq#B&Fwus?Z89hH%5X%iBxoRHw*D7BGKxisX-%BswcQ1Sns3ycF z3zDbEb8LV7KClc=|FN6EA#?*k5J5{)eHVk4n0@V}l>JWUH9kCU8j=EpU?Q37T1<~( zC$le#)vNhG|JGlRUw*JJ3WEi?${CQl%IYd}Ff0S81@?u);%o~^O^Ie3!WSjurB0~q zSkcO$U;#z2HGuO*%eG3UI8v!fd{HRb5XKj3lz^ivqXC)0O)xo98|wq8KxJ%x9;?j$ z4cE)kV?v{4hUvVGn>S+~0%XR%6^JcK-&;lggmtXx8U)TANd}?;BTFdT+)ku>t!S3< z>0ve@S=6yf9OaPhOvOEz10K0S4^A9Qdo|@wOsABV*_G*9nm}0GM+w%BU`^Di%W;i} zka3}cN04b;cx&+-<~BdK>)97BK?^4;3kirO~!HV#?7%2WpXMkjmcA==&iN z8{Z`!b?gp$oCDcKM9p~)SOP5+dDmwbmfvC%>L*T`Y7xB-t=WKl%2p4G!KF}E`jxm0 zhC1B4RIDzPz&NS_PwL7bcT)SCMFH2cm!DhZyYh|i1FY(Oc9r|1>ln=h1& zK(0}uy=-#=)1SjoeH*&C3Xq+Mi-nOQD-h(i+@<+09PG6C|ZL=7< z{?F=Nk_+(9whn{XqANSn;s&(8dRysR0Ei;aYr=DySFhZ4J=@ z02F-uW(TUhg35tn-speRW6#Hlm`Kk8^dU!`6TW|k6N^wFv_wx(?>ZV3K_%;{IAw?K zmQpVBdq0TY=L>)xBiNX{t!r8G@|Xspwg3p$|E@=64FF(r!4mMWKy>^J`VM-x4PB^L zKdsoP5HTfL2zkBmMDZbVze!roUC%{3PwRE5Vxii%OT_4T#u02xH@eDxak4Z z*zlFr2Lvl3;spD`VsZA26~LcOg?vBU<{8T!Q2OJHY4{JwN>%|VgaubuLJqE74Rb;d z*LWQw>dlfTT+>M`mu0^~*S`Zl!n^)>&^xeH*YZV47y+NFM&SB8KLucp7PkQ*M?xks ztIw`PPu?U7Y8tc)7R?a@?uGDj@{kM7*s{+Vy~78w`kcZKeUB@*#VZUi6IT!cATPG4 z;DK559?a68z&p1e`tA=P#dAD8|6Vo#mmf>+Op&6uyDYtrery`!g12iL`9M|4mJRgo zpWjedAY+}O7PpfzC9m&+m?f$+gtdm$zHjEo(3$smj=0F63-`Ut)`{0N0mUk$+?#{X zn^HWHNe!NhQ-{y}y#kCNq(J=>VknKZ*LCEyM_Ji~n&Yp(*T7zfWCE-Ifml~d!19a8 zEH{B3L|c3dAX&;*06|<|fS|G26z%!0Ke#X3LWJ)=2q&%JxVXREfAzcq*z9%74f7PCCvys*m%fy1tGl%-Gk!-p+FTPnUsld>DkR8|8&!nG z(S3|x*4}v-x24rbtsJ#D0vOgUDywfvSW=q38hmNgG22^k&r*G}GQx7fzgq3E2;UI) zmivGgG<5mCc8wUXij!j;CL$m$?!y%#umdQYB-4gXcT}aqqrsu(s}6vw_i%eX8<22_ z(I`mjj7reyVYsS(pG6@A-cxZW3?;ra)#YRVtL6{^j?;EHcF2A7Iq$^BPvV!gAmme* zEAForj8cN@4~+-zVf=Q0InX= zEERpi5P24W8Vu+xGR+x0UN*6 z@oaOFAmJydP2}PLmk)~uAISh{4gh^l%cRVfVm;ifO#S5V6Cad?C_;)qXA8MKcN7Kw zf;PPQg4^}Cz=M|ny~Q=Lt;Yxff*oJ8`3k^+*2JpnEa7LNLskmp;PMryT`1BdqzmAG zzBL0CnzV^G5FA5mAxWk3zabsbrJWLk3VHklO5ef3petsG%>iyN4B?gm`lWo@#qv$xml{tBtM*J{D?U_Lho zhh0WH_?m~|$?*<2i+G>rvU?sx@CMl9`y%|+u@$Z6U^CnlAe0l^L*RJYs-aPH0J<}Q zzwGPRuai!3T)J+XiQp_$_PNVkbAE?(>XKQMe6_M~nOMgLy8rXkQg*g87WCW-M#|5Q ze~8aR)^s{NT?e)wIlc3)@4p;9`+M5x2d1e>et5>v0DRj-zbVt26eS=au;~OC7FZ#l z4Qv17y+y@Cce~z6f3;K%=qyim_ET()M3pD@T}rjRe)LL047gGEGi94b7pC% z+~O+QHbo1Wo?J@?=toW`J7FQSw8-FIV9UFQF9QCLQ*i`*A%}G1fY-`tB6(Gc>q_Lh)j8CH?6Y@K8@*>IXS$a-fI1C=7#u(ZPR9{`=DDsw9(U` zgR*?nM!oWh%O+Z~MDgPP#67?>V4F{&Y+aDL1umc?)NE)WS|U~GeLzVzX3xWIuPoIUt}i}Onn;SFZTfGyZFUl-_b&ro_=cqe2>R#QGaCuRmSq-x7^h8*|()5-#q|cY96qcmoR_5 zGuOtp_yn*ZtiI`i9`2Iw#}(bb{a087GCfareE>jxlQu4!3q2F-JIqVcKV5U}_5@w$ zZmI4&R(0NLhmsoylbULx%VRV0ix1L>NlMEBx&H8aq;Qe7WhxXF^n|*IXgN&sp@FI)1BzI&e;Sc_0Sls)HIN_b1Vq8@9lND}QAzauytPoLco;!m88mf9 zB|;grC3gZOwOLh7i7pfLH2&SPU<;v*r;U?a&gxrSLY(co%ybhF#Jt22Hh*2FY{Ye* z>noCk9!aPx>h1sifs=>Y1fy93@IgVe3Se?Rf5IvD*WX1>62ki&3suB1f1O(h=a-Aw zZNJSEmI_@K8gM(QV~^4AiWZ4+9_j%ogjN&=F32HC^%ng?*D>p%ERq%k}P|Ljpur{|6Qw^<8|~h;B7W3Vlc#l_jVgG#?hVv4;w(O zn-pw%(Y0o%2?~lAV%-Tl$J_LqhT=>6R?Su4Upw}ly?=0%NTIFWP>>{W=mJ^6&b=-V zi4d?~wx!y0ioE9^kT#rMl-^X0_(e=HL>11r-d9L!dJ%alDmnLjhIzYCX)Z(Q)3Ojz z=>_n#sK@Gq84Z{B#OP!v`!a@u%vg#fq_JfT1}(CTh7`)6`~4Z6^ZQ-bec#vP-u&rn#?1HoS>DU@ z_4;H8?>bpooRXge*B^al;doHSvHe*@tg+EYxff#DDn^?=bHL8WqUD#wGYB=sjLvpd zM3gO3o~KS7bE#Egyfg&%{cx7MWa638Y{*H%#Wl#=8wa5M>D>&yOByb7uaS<^m+(^Q z`{!a=JCV?h5uU0(S6x;R0g{d>qjOyk)>#<=5WY6v)Zt;{0J!w_O29$;6j(^d=?pg{ zy;!?{YV11f%2hWt(s>FVHu&Bs#A$aYUndfuz|+R?N;Gb7YaGYDIO-XnD>Il?F0p;ccH5o81Xd=G(S#wvVoptB(~1#i0UhEy z(<4|}=LA-2jnO>NxLw^EkgX$iibki%zvACVnBJM7n0<6kV)bb-nn&Re9?G}i|IQ9k zusm5TH@}}1EMu~ORN1p1g5{u-DFqN@>Ib{TSTm?HW*?ESJaG5?b6+E&jiCbQ?! zPb+o=g!__0p?CSZ$iiyvS^`|V93%2QNn~YC*KeiDAlY%0z{!fSF&AUXBJRzuzUNz_ zv;dPpzu=fJc)@usTD}Q70DnfsWRZ6ZvxhbxszvDmy2j|-xtO&(-o}F)EkIQ<$Q}&t z{z|OXaRpMG4eiUAJrIMFmk);3p@@unEHrhbG)m-HgW}pn2*Y(B$qnZhoz~UQmDf@+ zH|{Y2XgVb|1lrg@GDJx5k_y_DioOf(NCZ*?mD@fiRhtA0iDKb;1pk8G#bSH>Kw-W3 zUPp&qU9BU<`AI=OhPEd}HPCgD1^tUIpq?rnZjyAg2&{qzCq6Ue_`bi&b>T2p^ki^u z{RL@F`jt)&x#kp^A_{)GI9~e`>p@_qciL=GtM+V{+nLbkO4jQs%wfSMKUfmXXgEBI zLX;PJP5AG80JK*DDtZRDYO+E-1a3!UP#=po%}dGB9^?-2p^;u%XnDMq^(4d9VO^7| z%C)O$76Oz$)7dV5{=*ZuLhGss)}6&f!onLj2$WYnUbi#l^`K#aVHW-3Un~vqQZP>9 z2B1y^JcKiJJEfE2GZ2|4+ka-eS7 z$b%Sh=;Kd7ku#BdXSPX@Pwx)^8?y6f>^etXVDZ0xDz#)$60heH^JN(ZyKRN^+Gpg2 z*|1GnN@ex6=QDFtcZ5SaT)C&@> zi9Wx)_L(H;vuQ1+zP41T+6YA}4EIy`su}4YN_y_NUr2-%f@6@K0NiV>={rMW0rGFOxIZN~O{3+?3Uu9zjLpTFeitX-v&cf=hcrxo zhvblgl1t|+L|`IWquuiC+PaQO6O+N#a4H}C&$7etadw@HF2EfGa;0Yg{DLGUcZBqtg~r>3>}$Y_dI@eUfDS-Hew%$;|5b9x?i7*3KEBAD$LrcpLLHe8@Q$5w zGZ~=aS=(K9zvsb;osd#lYEBnOCZ1DJ=suKGWdv&Hwd9hW175!{UHw^cnj*niy2mj% zQR}N%hsq!rc0sr{->u6rzs$u!o8L@!EnwBBe}QYei@9-Po%9CTuKVvk)X@D>yUvBg zmaYXls{L?d9#l)p^gz+~D#iVH4WaZ{edgL6v579@RYN$ZU)bRuD^YKk5Sr)NORW6@ z(+J;IJqAQT2=xqY=q<7krg_=I*BmpFv-A^vwPl!v+Hh5_Ay5hTI-cb}( zWAuJu*RgR%gwqk*tnPy$(8rWO5RAmLPrfKd!baB75B!T945P{0sQtJJWx_<)IWVXicvEz0u35%8p)4xlSh1k*`FS|qzxU%w1H}Oyw|H^ zS|efK>#{n2bm0VjFXv?MytoRyah7rvNT3cupHyr$!%Ioa!srQuz<_oy0b#+p@Z$8 zp0RP?NirdX>5C>$zmOk4QXpV=-%790{6~-$eskQ+^k1a8ZtneMHuW==BKgq7 zZ{loGHe5?|J<7~c^@n^j`XR}lRG4s~dtzOxQQbNHGqq8A#{CP;Z!-S|?f*ab6hH^I zksG_AcQwZ*vEy;-AAi)0%nbhNblrL*j8-6yPjnBI2VcvTmG9f$WG+=liM#oa1}!X?5c?=IW*|Q!r1GXTL$03VKoBqNezap>qbV9OD7;m0WGB&f4TTC`q)EuIqc&BH0a2vCUuq z%F*Md%(pK(|DAWtX%ptsLhp`pn*xvzAXUWQN$>Cae;j0Px?XvO@w(sx?0eT(4{a%Lr&45(vpcqx57l;@CAqDx{od#v^&5`hJs=WOn z3A8!`1dBPNs`b7k>CYO!*tn9EBOarw)gdn*D9AZtvs{4T2n{L>O{unS-9iu;##anb zka!G(xyXk*n;?~mv@QRrFi`3Yy8J_6fcmP?!9+ZN%_3DI-ngm#A2UHpAj$;!Sltlt z{z~I)Is}4*Y7QN+z%>Hv%b`Y2WZXr<2Q4xJ!9LIDP2|ogAO?J&PR8rVwJxqyJ^G4@ zCLRu6WsDVn++RWnsNEpeYAuqvr~&Ea;b(d z`OrONZ+$(RoypSR?*yJp=6EW7J%10f{r7!Og0A}va$t%K)4kh{fUU41xAFxe)-$(lP8B+=1^#d3Dw)dEPvaIzM_47TlX&AwiBr-va96fY;<;MC zvS0z`Du+j!SM4(#FWcKsfo91N44!?!@`V0}L)ukIiDz+4Snn6VO%LRlRoGi6@hceP ztxI4gnhPKI?$nk{eFR`!6Fpw64b*S6kwKtgCZXx?lkOyRj7bVwpgQbVR8;gEoEvL@ z4gG)kz!$)^GdFZ)-XyZzAJ1`M!He(cK_DuDJr7R;79VT=6W<77kj^`GATo2svG6{2^xnP)mKgY&g_ zy2)U4bpzk`#jx*r$l6pmNa)+ifqBZWGYjI-NFUqgHdt>4Yl`=PQ(l0Tb${E77oA{1 zm&?Y%Ki$&@Vz92wqQM!UlDdEx!20DmoyBk4uwP{O?l<4wc?AHn(jx^|-#ur%h>Xl5 zpZY#T?-qL=#VOcZ8{nlf`(#6*2|n#E5K9O%cz7b9oRWi$k+rCrfC9GnHD0WHR~D8b zdS`kM^vN~Gldaog*_MEi zF$CPFy*gz1>!(tO$;=xf`b|-sR_+D!2dE?CdRSBGy$j@~s0i@R-Z!!UFsHwhr7X=+ zIXp6pD;GSse6$jF>;OHIq-~=ksLw3%`o_jMWsI_28?RqDx7I7);r)jB8Ec-~HjntT zitDaU>^=7&19tBz_z$hsv~7B3yU%NDx6=B+%**eD#_m2XH=165U-!~iqBtz#PqXoJy` zoldV{+Z%O*Z)H7~(udbk@@)M>0?N|h_pBtc>qf)+v`ReZmBZJ>RgrKtnfG?q#Zt?f z;A+>oBDT=vp^ovz7vpyZPED%n9R%sEU%Nu~#PUqocQ1~g%_g+E2sMX%=zSg48cv^2 zx@=^-6(zeo*Ti+|v5oCmdlJRHVdX^nr3f(o2cbn zNAL>r=q;HjF(ow0o^sL}oLjw4NImC!I5vjUqlH^M{Utw(ZeW&|2pYQ%r z4Ff;p>oR}G0YQ#5C3%@Adt1jjb&q<2WxL1tI`yt{VQoPHoL_!?Na&TrkZlTbcJ)2%l9h-1WFWPKng-!^=sptVoMxGDzf) z8>!=z5BsTl+(I>*I0YVit@hu04q9ZQ$I`$Re2wrgyu!-qTZ`=YRg{k=-n%!tdeU^X zT7;W!G({s-!?LtdS<22U(Ix#U|8+uuUsZx)SlI0}w$7>7MWlx2g~i6zZG&pK+w%3M znbOb6lkK-7Lwxlv|`bd^dkvVnb20X96m5{R64;@FcMB6SG{dBN_FhNy& zq9q#A`_a_g-3zIGqr<)}^K&YnvE$f9tyQKIDspCA)j;fG5kkGd|F1)va9bZ>u4@hvcY14|Ki{F!!~~2r4+waA&3IZ59BqJq zcUSoW2%%r2rK4$;>MODU+pNf}0uqzUk){tWR-pFdwGEq{foQK;YyhDx-1$Wt3WSrQ z3ZYCg@R$lwIrSmTsh5YyL*w=3cJ0y%3Kx+I0G8U(-=kgWkQqWBv;fuzYv?=~dV0>; zthfYS;5ZH$!T#A$4y;PhLa(sjP$q*$+Al7yap`$X7odctx~EnQ0kuY8dSYT?j^|jL zurVhle3oa%2hm>!L~kdfQpZuU|K=1+uRDU$4ujUgpRnJ8>xag_f}f=KQ?{LtnPinkOLVq^Il(rPo_R3f#_|}S zpddXk70~_%Bw0A#0+`KC&_5H&2Tm}}J3Z8kD8Ya*mTvBtdJHR|6`3DC1$gQL_=GTj zU`@U77N#{5S+{!|!opIa08r{T5hdv6p`t4q@+A(u(SVVi{C1Ua==&ucTzl ziV%IJyd947VBz&?1#rp&Lf!k2m9NH z|LGV99q_jw4zQWx@6MlwRn80%&$rdy&y&MedqQYsHce{j4HvUdqvJLbQd3I`4DzO6 z1?eQs)YI2@!FFUax)==F5;++&Ja{4Gc+!`l-0=lGm*1a!6`1Mrh8?5o51<3)R%(f( z-(!s~nP;Z4uuj(fNe<~_8uoV@K~ps9u|&J;&2$xA2&ZYy>pM}HFuK@(a9oMp7WESd zNly5YIx@>`{NN#k*0+c<&W2*b-?o6?tg30juMPErQ4Sh&G9J`ses_Q9u+wTKNUonF z=gL3tHRFn-h3!^KnuKFJ`W=Fhs)0P1%-K=#%|NMTUl~So_;^XF>K$;g-fI!O9>Xm% zVYVJhvdM#&I1WvB;KnX!Yabh395|q4q51|w0d(lg+cdD>UP`ivdHO-gD)if!o6QXv zShSJ^2M*=_EH0RK-7rEt$$8+VdI`GpV?l1oVmgk=nCI%^>$&7u?q`FtcV1 zZp+HpsGO;$`6dg#-D&!8DDW`|9>#IIt9Q`TJQ}wwfycy`=YsBz?_b1e^#{VY2DK=B zK#Prg%DAB)cJFsK+y;W`7nsE)Hj&~ZEsaL8A}T{-VDnrGRCikFk`p>yMvfQhoeIJF z0QiR)NIt5h zncNPkBo^-#?AwfC=Tca`o|?%@#Im@(U|9WMyXO9h|Gru7YS>Lq@_8^W!eF$q7c@)N IfBXI40CSn3O#lD@ literal 0 HcmV?d00001 diff --git a/docs/quickstart.md b/docs/quickstart.md index edb77553..ee06e296 100644 --- a/docs/quickstart.md +++ b/docs/quickstart.md @@ -18,102 +18,66 @@ pip install cluster-experiments --- -## Your First Analysis (5 minutes) +## 1. Your First Analysis + +Let's analyze a simple A/B test with multiple metrics. This is the most common use case. *See [Simple A/B Test](examples/simple_ab_test.html) for a complete walkthrough.* -Let's analyze a simple A/B test with multiple metrics. This is the most common use case. ```python import pandas as pd import numpy as np -from cluster_experiments import ( - AnalysisPlan, SimpleMetric, RatioMetric, - Variant, HypothesisTest -) +from cluster_experiments import AnalysisPlan, Variant -# Simulate experiment data +# 1. Set seed for reproducibility np.random.seed(42) -n_users = 5000 -data = pd.DataFrame({ - 'user_id': range(n_users), - 'variant': np.random.choice(['control', 'treatment'], n_users), - 'orders': np.random.poisson(2.5, n_users), - 'visits': np.random.poisson(10, n_users), +# 2. Create simulated data +N = 1_000 +df = pd.DataFrame({ + "variant": np.random.choice(["control", "treatment"], N), + "orders": np.random.poisson(10, N), + "visits": np.random.poisson(100, N), +}) +# Add some treatment effect to orders +df.loc[df["variant"] == "treatment", "orders"] += np.random.poisson(1, df[df["variant"] == "treatment"].shape[0]) + +df["converted"] = (df["orders"] > 0).astype(int) +df["cost"] = np.random.normal(50, 10, N) # New metric: cost +df["clicks"] = np.random.poisson(200, N) # New metric: clicks + +# 3. Define your analysis plan +plan = AnalysisPlan.from_metrics_dict({ + "metrics": [ + {"name": "orders", "alias": "revenue", "metric_type": "simple"}, + {"name": "converted", "alias": "conversion", "metric_type": "ratio", "numerator": "converted", "denominator": "visits"}, + {"name": "cost", "alias": "avg_cost", "metric_type": "simple"}, + {"name": "clicks", "alias": "ctr", "metric_type": "ratio", "numerator": "clicks", "denominator": "visits"} + ], + "variants": [ + {"name": "control", "is_control": True}, + {"name": "treatment", "is_control": False} + ], + "variant_col": "variant", + "analysis_type": "ols" }) -# Add treatment effect -data.loc[data['variant'] == 'treatment', 'orders'] += np.random.poisson(0.5, (data['variant'] == 'treatment').sum()) -data['converted'] = (data['orders'] > 0).astype(int) - -# Define metrics by type and category -absolute_metrics = { - "orders": "revenue" # metric_name: category -} - -ratio_metrics = { - "conversion_rate": { - "category": "conversion", - "components": ["converted", "visits"] # [numerator, denominator] - } -} - -# Define variants -variants = [ - Variant("control", is_control=True), - Variant("treatment", is_control=False) -] - -# Build hypothesis tests from metric definitions -hypothesis_tests = [] - -# 1. Ratio metrics: use delta method for proper ratio analysis -for metric_name, config in ratio_metrics.items(): - metric = RatioMetric( - alias=f"{config['category']}__{metric_name}", - numerator_name=config['components'][0], - denominator_name=config['components'][1] - ) - hypothesis_tests.append( - HypothesisTest( - metric=metric, - analysis_type="delta", - analysis_config={ - "scale_col": metric.denominator_name, - "cluster_cols": ["user_id"] - } - ) - ) - -# 2. Absolute metrics: use standard OLS -for metric_name, category in absolute_metrics.items(): - metric = SimpleMetric( - alias=f"{category}__{metric_name}", - name=metric_name - ) - hypothesis_tests.append( - HypothesisTest( - metric=metric, - analysis_type="ols" - ) - ) - -# Create and run analysis plan -analysis_plan = AnalysisPlan( - tests=hypothesis_tests, - variants=variants, - variant_col='variant' -) - -# Run analysis -results = analysis_plan.analyze(data) -print(results.to_dataframe()) +# 4. Run analysis on your dataframe +results = plan.analyze(df) +print(results.to_dataframe().head()) ``` -**Output:** A comprehensive scorecard with treatment effects, confidence intervals, and p-values! +**Output:** +``` + metric_alias control_variant_name treatment_variant_name control_variant_mean treatment_variant_mean analysis_type ate ate_ci_lower ate_ci_upper p_value std_error dimension_name dimension_value alpha +0 revenue control treatment 9.973469 10.994118 ols 1.020648e+00 6.140829e-01 1.427214e+00 8.640027e-07 2.074351e-01 __total_dimension total 0.05 +1 conversion control treatment 1.000000 1.000000 ols -4.163336e-16 -5.971983e-16 -2.354689e-16 6.432406e-06 9.227960e-17 __total_dimension total 0.05 +2 avg_cost control treatment 49.463206 49.547386 ols 8.417999e-02 -1.222365e+00 1.390725e+00 8.995107e-01 6.666166e-01 __total_dimension total 0.05 +3 ctr control treatment 199.795918 199.692157 ols -1.037615e-01 -1.767938e+00 1.560415e+00 9.027376e-01 8.490855e-01 __total_dimension total 0.05 +``` --- -## Understanding Your Results +## 1.1. Understanding Your Results The results dataframe includes: @@ -127,46 +91,72 @@ The results dataframe includes: | `p_value` | Statistical significance (< 0.05 = significant) | !!! tip "Interpreting Results" - - **p_value < 0.05**: Result is statistically significant - - **Confidence interval**: If it doesn't include 0, effect is significant + - **p_value < 0.05**: Result is statistically significant (95% confidence) + - **Confidence interval**: If it doesn't include 0, effect is significant (95% confidence) + --- -## Common Use Cases +#### 1.2. Analysis Extensions: Ratio Metrics -### 1. Analyzing an Experiment +`cluster-experiments` has built-in support for ratio metrics (e.g., conversion rate, average order value), as seen in the first example: + +```python +# Ratio metric: conversions / visits +{ + 'alias': 'conversion_rate', + 'metric_type': 'ratio', + 'numerator_name': 'converted', # Numerator column + 'denominator_name': 'visits' # Denominator column +} +``` + +The library automatically handles the statistical complexities of ratio metrics using the Delta Method. -**When:** You've already run your experiment and have the data. +#### 1.3. Analysis Extensions: Multi-dimensional Analysis -**Example:** See [Simple A/B Test](examples/simple_ab_test.html) for a complete walkthrough. +Slice your results by dimensions (e.g., city, device type): ```python -# Use AnalysisPlan with your experiment data -results = analysis_plan.analyze(experiment_data) +analysis_plan = AnalysisPlan.from_metrics_dict({ + 'metrics': [...], + 'variants': [...], + 'variant_col': 'variant', + 'dimensions': [ + {'name': 'city', 'values': ['NYC', 'LA', 'Chicago']}, + {'name': 'device', 'values': ['mobile', 'desktop']}, + ], + 'analysis_type': 'ols', +}) ``` +Results will include treatment effects for each dimension slice. + --- -### 2. Power Analysis (Sample Size Planning) +## 2. Power Analysis + +Before running an experiment, it's crucial to know how long it needs to run to detect a significant effect. +See the [Power Analysis Guide](power_analysis_guide.html) for more complex designs (switchback, cluster randomization) and simulation methods. -**When:** You're designing an experiment and need to know how many users/time you need. +### 2.1. MDE -**Example:** Calculate power or Minimum Detectable Effect (MDE). +Calculate the Minimum Detectable Effect (MDE) for a given sample size ($), $/alpha$ and $\beta$. parameters. ```python -import numpy as np import pandas as pd +import numpy as np from cluster_experiments import NormalPowerAnalysis -# Create historical data +# Create sample historical data np.random.seed(42) +N = 500 historical_data = pd.DataFrame({ - 'user_id': range(500), - 'metric': np.random.normal(100, 20, 500), - 'date': pd.to_datetime('2025-10-01') + pd.to_timedelta(np.random.randint(0, 30, 500), unit='d') + 'user_id': range(N), + 'metric': np.random.normal(100, 20, N), + 'date': pd.to_datetime('2025-10-01') + pd.to_timedelta(np.random.randint(0, 30, N), unit='d') }) -# Define your analysis setup power_analysis = NormalPowerAnalysis.from_dict({ 'analysis': 'ols', 'splitter': 'non_clustered', @@ -174,183 +164,53 @@ power_analysis = NormalPowerAnalysis.from_dict({ 'time_col': 'date' }) -# Calculate MDE for 80% power mde = power_analysis.mde(historical_data, power=0.8) -print(f"Need {mde:.2f} effect size for 80% power") - -# Or calculate power for a given effect size -power = power_analysis.power_analysis(historical_data, average_effect=5.0) -print(f"Power: {power:.1%}") +print(f"Minimum Detectable Effect: {mde}") +Minimum Detectable Effect: 4.935302024560818 ``` -**Learn more:** See [Power Analysis Guide](power_analysis_guide.html) for detailed explanation. - ---- - -### 3. Cluster Randomization +### 2.2. Calculate Power -**When:** Randomization happens at group level (stores, cities) rather than individual level. - -**Why:** Required when there are spillover effects or operational constraints. - -**Example:** +Calculate the statistical power for a specific effect size you expect to see. ```python -import pandas as pd -import numpy as np -from cluster_experiments import AnalysisPlan - -# Simulate store-level experiment data -np.random.seed(42) -n_stores = 50 -transactions_per_store = 100 - -data = [] -for store_id in range(n_stores): - variant = np.random.choice(['control', 'treatment']) - n_trans = np.random.poisson(transactions_per_store) - - store_data = pd.DataFrame({ - 'store_id': store_id, - 'variant': variant, - 'purchase_amount': np.random.normal(50, 20, n_trans) - }) - data.append(store_data) - -experiment_data = pd.concat(data, ignore_index=True) - -# Use clustered_ols for cluster-randomized experiments -analysis_plan = AnalysisPlan.from_metrics_dict({ - 'metrics': [{'alias': 'revenue', 'name': 'purchase_amount', 'metric_type': 'simple'}], - 'variants': [ - {'name': 'control', 'is_control': True}, - {'name': 'treatment', 'is_control': False}, - ], - 'variant_col': 'variant', - 'analysis_type': 'clustered_ols', # ← Key difference! - 'analysis_config': { - 'cluster_cols': ['store_id'] # ← Specify clustering variable - } -}) - -results = analysis_plan.analyze(experiment_data) -print(results.to_dataframe()) +power = power_analysis.power_analysis(historical_data, average_effect=3.5) +print(f"Power: {power}") +Power: 0.510914982752414 ``` -**Learn more:** See [Cluster Randomization Example](examples/cluster_randomization.html). - ---- - -### 4. Variance Reduction (CUPAC/CUPED) - -**When:** You have pre-experiment data and want to reduce variance for more sensitive tests. +### 2.3. Visualize Power Curve -**Benefits:** Detect smaller effects with same sample size. - -**Example:** +It's helpful to visualize how power changes with effect size. ```python -import pandas as pd -import numpy as np -from cluster_experiments import ( - AnalysisPlan, TargetAggregation, HypothesisTest, - SimpleMetric, Variant -) - -# Simulate experiment data -np.random.seed(42) -n_customers = 1000 - -experiment_data = pd.DataFrame({ - 'customer_id': range(n_customers), - 'variant': np.random.choice(['control', 'treatment'], n_customers), - 'order_value': np.random.normal(100, 20, n_customers), - 'customer_age': np.random.randint(20, 60, n_customers), -}) +import matplotlib.pyplot as plt -# Simulate pre-experiment data (historical) -pre_experiment_data = pd.DataFrame({ - 'customer_id': range(n_customers), - 'order_value': np.random.normal(95, 25, n_customers), # Historical order values -}) - -# Define CUPAC model using pre-experiment data -cupac_model = TargetAggregation( - agg_col="customer_id", - target_col="order_value" -) - -# Create hypothesis test with CUPAC -test = HypothesisTest( - metric=SimpleMetric(alias="revenue", name="order_value"), - analysis_type="clustered_ols", - analysis_config={ - "cluster_cols": ["customer_id"], - "covariates": ["customer_age", "estimate_order_value"], - }, - cupac_config={ - "cupac_model": cupac_model, - "target_col": "order_value", - }, +# Calculate power for multiple effect sizes +effect_sizes = [2.0, 4.0, 6.0, 8.0, 10.0] +power_curve = power_analysis.power_line( + historical_data, + average_effects=effect_sizes ) -plan = AnalysisPlan( - tests=[test], - variants=[Variant("control", is_control=True), Variant("treatment", is_control=False)], - variant_col="variant", -) - -# Analyze with both experiment and pre-experiment data -results = plan.analyze(experiment_data, pre_experiment_data) -print(results.to_dataframe()) -``` - -**Learn more:** See [CUPAC Example](cupac_example.html). - ---- - -## Ratio Metrics - -`cluster-experiments` has built-in support for ratio metrics (e.g., conversion rate, average order value), as seen in the first example: - -```python -# Ratio metric: conversions / visits -{ - 'alias': 'conversion_rate', - 'metric_type': 'ratio', - 'numerator_name': 'converted', # Numerator column - 'denominator_name': 'visits' # Denominator column -} +# Plotting +plt.figure(figsize=(10, 6)) +plt.plot(power_curve['average_effect'], power_curve['power'], marker='o') +plt.title('Power Analysis: Effect Size vs Power') +plt.xlabel('Effect Size') +plt.ylabel('Power') +plt.grid(True) +plt.show() ``` -The library automatically handles the statistical complexities of ratio metrics using the Delta Method. - ---- +![Power Analysis Curve](quick_start_power_curve.png) -## Multi-Dimensional Analysis -Slice your results by dimensions (e.g., city, device type): - -```python -analysis_plan = AnalysisPlan.from_metrics_dict({ - 'metrics': [...], - 'variants': [...], - 'variant_col': 'variant', - 'dimensions': [ - {'name': 'city', 'values': ['NYC', 'LA', 'Chicago']}, - {'name': 'device', 'values': ['mobile', 'desktop']}, - ], - 'analysis_type': 'ols', -}) -``` - -Results will include treatment effects for each dimension slice! --- +## 3. Quick Reference -## Quick Reference - -### Analysis Types +### 3.1. Analysis Types Choose the appropriate analysis method: @@ -362,27 +222,52 @@ Choose the appropriate analysis method: | `mlm` | Multi-level/hierarchical data | | `synthetic_control` | Observational studies, no randomization | -### Dictionary vs Class-Based API -Two ways to define analysis plans: +### 3.2. Dictionary vs Class-Based API + +`cluster-experiments` offers two ways to define analysis plans, catering to different needs: + +#### 3.2.1. Dictionary Configuration + +Best for storing configurations in YAML/JSON files and automated pipelines. -**Dictionary (simpler):** ```python -plan = AnalysisPlan.from_metrics_dict({...}) +config = { + "metrics": [ + {"name": "orders", "alias": "revenue", "metric_type": "simple"}, + {"name": "converted", "alias": "conversion", "metric_type": "ratio", "numerator": "converted", "denominator": "visits"} + ], + "variants": [ + {"name": "control", "is_control": True}, + {"name": "treatment", "is_control": False} + ], + "variant_col": "variant", + "analysis_type": "ols" +} + +plan = AnalysisPlan.from_metrics_dict(config) ``` -**Class-based (more control):** +#### 3.2.2 Class-Based API + +Best for exploration and custom extensions. + ```python from cluster_experiments import HypothesisTest, SimpleMetric, Variant +# Explicitly define objects +revenue_metric = SimpleMetric(name="orders", alias="revenue") +control = Variant("control", is_control=True) +treatment = Variant("treatment", is_control=False) + plan = AnalysisPlan( - tests=[HypothesisTest(metric=SimpleMetric(...), ...)], - variants=[Variant(...)], + tests=[HypothesisTest(metric=revenue_metric, analysis_type="ols")], + variants=[control, treatment], variant_col='variant' ) ``` ---- + ## Next Steps diff --git a/mkdocs.yml b/mkdocs.yml index 89fc1c59..79d40e9f 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -10,30 +10,10 @@ site_dir: site nav: - Home: index.md - - Quickstart: quickstart.md - - Power Analysis: normal_power_lines.ipynb - - - API Reference: - - Experiment Analysis: - - Analysis Plan: api/analysis_plan.md - - Analysis Results: api/analysis_results.md - - Experiment Analysis Methods: api/experiment_analysis.md - - Hypothesis Test: api/hypothesis_test.md - - Metrics & Variants: - - Metric: api/metric.md - - Variant: api/variant.md - - Dimension: api/dimension.md - - Power Analysis: - - Power Analysis: api/power_analysis.md - - Power Config: api/power_config.md - - Randomization: - - Splitters: api/random_splitter.md - - Variance Reduction: - - CUPAC Model: api/cupac_model.md - - Switchback: - - Washover: api/washover.md - - Perturbators: api/perturbator.md - + - Quickstart: + - Quickstart: quickstart.md + - Power Analysis Guide: normal_power_lines.ipynb + - Examples: - Basic Usage: - Simple A/B Test: examples/simple_ab_test.ipynb @@ -59,8 +39,30 @@ nav: - Multiple Treatments: multivariate.ipynb - Synthetic Control: synthetic_control.ipynb - Custom Classes: create_custom_classes.ipynb - + + - API Reference: + - Experiment Analysis: + - Analysis Plan: api/analysis_plan.md + - Analysis Results: api/analysis_results.md + - Experiment Analysis Methods: api/experiment_analysis.md + - Hypothesis Test: api/hypothesis_test.md + - Metrics & Variants: + - Metric: api/metric.md + - Variant: api/variant.md + - Dimension: api/dimension.md + - Power Analysis: + - Power Analysis: api/power_analysis.md + - Power Config: api/power_config.md + - Randomization: + - Splitters: api/random_splitter.md + - Variance Reduction: + - CUPAC Model: api/cupac_model.md + - Switchback: + - Washover: api/washover.md + - Perturbators: api/perturbator.md + - Contributing: CONTRIBUTING.md + - License: license.md extra: social: @@ -88,9 +90,9 @@ theme: features: - content.tabs - content.code.annotate + - content.code.copy - navigation.instant - navigation.tracking - - navigation.sections - navigation.top palette: primary: indigo