Stata Project Template for Reproducible Research

A comprehensive template repository for reproducible Stata analysis projects using modern workflow tools and best practices. This template integrates statacons for dependency management with IPA's Data Cleaning Guide and Stata coding standards, along with established practices from leading development economics research groups.

Warning

NEVER COMMIT DATA FILES TO GITHUB.

NEVER USE AI ASSISTANTS WITH PERSONALLY IDENTIFIABLE DATA.

YOU ARE REQUIRED TO REMOVE IDENTIFIING INFORMATION BEFORE CONNECTING AI

ASSISTANTS OR STORING IN ANY UNENCRYPTED LOCATION.

Objectives

[Include a section describing the objectives of this repository.]

Directory Structure

[Include a description of the directory structure]

Development Setup

This repository is meant to be run from either Stata or VSCode. To run from Stata, use the do files within the scripts/do subdirectory.

Our recommended approach is to use VSCode, in order to set up your coding environment in VS Code to work with Stata, follow these instructions:

Ensure that Stata is installed and license is added. Make note of where Stata is installed in your operating system.

Common Locations:

Windows: C:\Program Files\Stata18\StataSE-64.exe
macOS: /Applications/Stata/StataSE.app/Contents/MacOS/StataSE
Linux: /usr/local/stata18/stata-se

Copy the .env-example file to .env and set the STATA_CMD and STATA_EDITION environment variables that match your operating system and version of Stata.

# Common Stata installation paths
# Windows, Stata 18
STATA_CMD='C:\Program Files\Stata18\StataSE-64.exe'
STATA_EDITION='se'

# Windows, Stata 19
# STATA_CMD='C:\Program Files\StataNow19\StataSE-64.exe'
# macOS
# STATA_CMD='/Applications/Stata/StataSE.app/Contents/MacOS/StataSE'
# Linux or WSL
# STATA_CMD='/usr/local/stata18/stata-se'

Make sure that you have Just installed

# Windows
winget install --id Casey.Just -e
# Mac/Linux using Homebrew
brew install just

Run the following command to install the necessary software

just get-started

This will make sure that the following are installed:

uv for python environment management
Git for version control
GitHub CLI for interaction with GitHub
Quarto for literate programming, writing reports, generating presentations
markdownlint-cli2 for formatting of Markdown documents (including Quarto Markdown)
Python virtual environment via uv is added in .venv/ at the root of this directory.
nbstata is installed within the Python virtual environment. This will enable you to send code to Stata from VSCode and Jupyter notebooks.
Stata commands specified in .config/stata/stata_requirements.txt

Tip

If you want to add Stata requirements, add them to .config/stata/stata_requirements.txt and then run just stata-install-packages

The just get-started command should set up your VS Code environment to work with Stata.

Confirm that you see the following in the .venv/etc/nbstata.conf:

stata_dir should be the same as your STATA_CMD value and edition should be the same as STATA_EDITION.

[nbstata]
stata_dir = C:\Program Files\StataNow19
edition = se
splash = False
graph_format = png
graph_width = 5.5in
graph_height = 4in
echo = None
missing = .
browse_auto_height = True

If you don't see that information, add the relevant information to .venv/etc/nbstata.conf. Alternatively, you can add the information above to ~/.config/nbstata/nbstata.conf.

See the nbstata User Guide for more information

Verify your Stata configuration:

# Test basic Stata access
just stata-check-installation

Install the vscode-stata extension for VS Code
There are two options for testing the nbstata integration in VS Code:

From VS Code, try running code in the scripts/demo/nbstata-demo.do.
From VS Code, try running or rendering the code in scripts/demo/nbstata-demo.qmd
From VS Code, try running or rendering the code in scripts/demo/nbstata-demo.ipynb

In each of the cases above, make sure that you select the nbstata Jupyter Kernel located at .venv/Scripts/python.exe (Windows) or .venv/bin/python (MacOS, Linux).

Troubleshooting

Command not found errors:

Verify Stata path in .env file
Check that Stata is installed and accessible
Ensure quotes around paths with spaces (Windows)

Permission errors (macOS/Linux):

Use sudo when creating symlinks
Check file permissions on Stata executable

Batch mode issues:

Ensure your Stata license supports batch processing
Some Stata commands may not work in batch mode

Project Structure

This template follows best practices for Stata project organization:

├── data/
│   ├── raw/           # Original, immutable data files
│   ├── clean/         # Cleaned data (intermediate)
│   └── final/         # Analysis-ready datasets
├── scripts/           # Code
│   ├── demo/            # Demo scripts
│   └── do/            # Stata do-files
│       ├── 00_run.do      # Master do-file
│       ├── 01_data_cleaning.do
│       ├── 02_data_preparation.do
│       ├── 03_descriptive_analysis.do
│       ├── 04_main_analysis.do
│       ├── 05_robustness_checks.do
│       └── 06_generate_figures.do
├── ado/               # User-written Stata packages
├── analysis/logs/     # Log files from Stata runs
├── outputs/
│   ├── tables/        # Regression tables (.tex files)
│   └── figures/       # Figures (.pdf files)
├── documentation/     # Project documentation
└── SConstruct         # statacons workflow definition

The scripts/demo/nbstata-demo.qmd file provides a Quarto notebook example for interactive Stata analysis.

Workflow Features

Automated Dependency Management

statacons integration: Automatically tracks file dependencies and rebuilds only what's necessary
Reproducible environments: Stata packages managed in local ado/ folder
Version control friendly: All outputs are generated, not committed

Best Practice Implementation

IPA Data Standards: Follows IPA Data Cleaning Guide and Stata coding best practices
Data Carpentry Methods: Implements research-grade programming techniques for data exploration, transformation, and combination
Standardized coding style: Implementing IPA, Data Carpentry, DIME Analytics, and Sean Higgins guidelines
Defensive programming: Uses assert statements and quality checks throughout
Advanced programming: Includes loops, macros, temporary files, and modular programming
Extended missing values: Implements IPA's .d/.o/.n/.r/.s conventions
Code quality enforcement: Integrated stata_linter for style checking and best practices
Reproducible package management: Requirements-based Stata package installation system
Comprehensive logging: All Stata runs generate detailed log files
Publication-ready outputs: Tables in LaTeX format, figures in PDF

Using the Template

1. Data Preparation

Place raw data in data/raw/
Modify scripts/do/01_data_cleaning.do for your data cleaning steps
Modify scripts/do/02_data_preparation.do for analysis sample creation

2. Analysis

Update analysis scripts (03_descriptive_analysis.do, 04_main_analysis.do, 05_robustness_checks.do)
Modify scripts/do/06_generate_figures.do for your visualization needs
Run entire pipeline with scons or individual steps with scons [target]

3. IPA Visualizations (Recommended for IPA Staff)

For IPA staff, install the ipaplots package for branded visualizations:

net install github, from("https://haghish.github.io/github/")
github install PovertyAction/ipaplots

The template automatically detects and uses the IPA theme when available, falling back to default schemes otherwise.

4. Package Management and Environment Reproducibility

Stata lacks a built-in package manager, making reproducible environments challenging. This template provides a requirements-based system:

# Install all required packages from requirements file
just stata-install-packages

Package Requirements File: scripts/setup/stata_requirements.txt contains a list of required packages with their installation sources:

# Format: package_name,install_source,install_command
estout,ssc,ssc install estout
reghdfe,ssc,ssc install reghdfe
ipaplots,github,github install PovertyAction/ipaplots
stata_linter,net,net install stata_linter, from(https://raw.githubusercontent.com/worldbank/stata-linter/main)

5. Code Quality with stata_linter

This template integrates stata_linter from the World Bank DIME team for enforcing Stata coding best practices:

# Lint all Stata do-files and generate Excel report
just lint-stata

# Lint a specific do-file
just lint-stata-file scripts/do/01_data_cleaning.do

# Check if stata_linter is installed
just stata-check-linter

# Install stata_linter (included in package requirements)
just stata-install-packages

The linter checks for:

Variable naming conventions
Proper use of global macros for file paths
Consistent indentation and spacing
Deprecated command usage
Best practices for loops and conditionals

Linting reports are saved to analysis/logs/stata_linter_report.xlsx with detailed feedback on code quality issues.

6. Generate Reports

Create publication-ready reports that automatically include your Stata outputs:

# Generate complete analysis and report
just full-analysis-report

# Or generate report from existing outputs
just render-report

# Preview report in browser
just preview-report

The Quarto report template automatically integrates your Stata outputs including LaTeX tables and PDF figures.

7. Outputs

Tables will be generated in outputs/tables/ (LaTeX format)
Figures will be generated in outputs/figures/ (PDF format, with IPA branding when available)
Log files will be saved in analysis/logs/
Reports will be generated in reports/ (PDF, HTML, or Typst format)

Acknowledgments and References

This template builds upon established best practices and tools from the development economics and data science communities:

Primary Guidelines and Standards

IPA Data Cleaning Guide (Website): Comprehensive guide for data cleaning best practices
- Organization: Innovations for Poverty Action (IPA)
- Covers: Raw data management, variable management, dataset documentation, data aggregation
IPA Stata Tutorials (Website): Stata coding standards and best practices
- Organization: Innovations for Poverty Action (IPA)
- Covers: Stata syntax, data processing, coding standards
Data Carpentry Stata Economics (Website): Research-grade Stata programming curriculum
- Organization: Data Carpentry
- Covers: Data exploration, quality assessment, transformation, combination, programming, loops, advanced techniques
- License: CC BY 4.0

Core Dependencies

statacons (GitHub | Documentation): Python package for managing Stata workflows
- Authors: Brian Quistorff and colleagues
- License: MIT License
ipaplots (GitHub): IPA-branded Stata graphing scheme
- Authors: Ronny Condor, Kelly Montaño (IPA Peru)
- Organization: Innovations for Poverty Action
- Features: Professional visualization theme with IPA branding

Coding Standards and Best Practices

Sean Higgins Stata Guide (GitHub): Comprehensive coding style and workflow recommendations
- Author: Sean Higgins
- License: Creative Commons
DIME Analytics Data Handbook (Website): World Bank DIME team coding standards
- Organization: World Bank Development Impact Evaluation (DIME)
- License: MIT License
World Bank Reproducible Research Repository (GitHub): Guidelines for reproducible research
- Organization: World Bank
- License: Mozilla Public License 2.0

Development Tools

uv (Documentation): Fast Python package installer and resolver
Just (GitHub): Command runner for development tasks
Quarto (Website): Scientific and technical publishing system

Advance Workflow with SCons

Automated Build System (Recommended) - `SConstruct`

just stata-full     # Complete pipeline with build system
# OR use scons directly:
scons              # Builds entire analysis pipeline
scons data         # Builds only data cleaning/preparation
scons analysis     # Builds only analysis outputs
scons figures      # Builds only figures
scons -c           # Clean all outputs

License

This template is released under the MIT License. See LICENSE for details.

While this template is MIT licensed, please respect the licenses of the constituent tools and respect the intellectual contributions of the referenced guides and best practices.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.claude/skills		.claude/skills
.config		.config
.github		.github
.vscode		.vscode
ado		ado
data/raw		data/raw
documentation		documentation
reports		reports
scripts		scripts
.env-example		.env-example
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Justfile		Justfile
LICENSE		LICENSE
README.md		README.md
SConstruct		SConstruct
_environment		_environment
_quarto.yml		_quarto.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Stata Project Template for Reproducible Research

Objectives

Directory Structure

Development Setup

Troubleshooting

Project Structure

Workflow Features

Automated Dependency Management

Best Practice Implementation

Using the Template

1. Data Preparation

2. Analysis

3. IPA Visualizations (Recommended for IPA Staff)

4. Package Management and Environment Reproducibility

5. Code Quality with stata_linter

6. Generate Reports

7. Outputs

Acknowledgments and References

Primary Guidelines and Standards

Core Dependencies

Coding Standards and Best Practices

Development Tools

Advance Workflow with SCons

Automated Build System (Recommended) - `SConstruct`

License

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

PovertyAction/ipa-stata-template

Folders and files

Latest commit

History

Repository files navigation

Stata Project Template for Reproducible Research

Objectives

Directory Structure

Development Setup

Troubleshooting

Project Structure

Workflow Features

Automated Dependency Management

Best Practice Implementation

Using the Template

1. Data Preparation

2. Analysis

3. IPA Visualizations (Recommended for IPA Staff)

4. Package Management and Environment Reproducibility

5. Code Quality with stata_linter

6. Generate Reports

7. Outputs

Acknowledgments and References

Primary Guidelines and Standards

Core Dependencies

Coding Standards and Best Practices

Development Tools

Advance Workflow with SCons

Automated Build System (Recommended) - SConstruct

License

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Automated Build System (Recommended) - `SConstruct`

Packages