A comprehensive template repository for reproducible Stata analysis projects using modern workflow tools and best practices. This template integrates statacons for dependency management with IPA's Data Cleaning Guide and Stata coding standards, along with established practices from leading development economics research groups.
Warning
NEVER COMMIT DATA FILES TO GITHUB.
NEVER USE AI ASSISTANTS WITH PERSONALLY IDENTIFIABLE DATA.
YOU ARE REQUIRED TO REMOVE IDENTIFIING INFORMATION BEFORE CONNECTING AI
ASSISTANTS OR STORING IN ANY UNENCRYPTED LOCATION.
[Include a section describing the objectives of this repository.]
[Include a description of the directory structure]
This repository is meant to be run from either Stata or VSCode.
To run from Stata, use the do files within the scripts/do subdirectory.
Our recommended approach is to use VSCode, in order to set up your coding environment in VS Code to work with Stata, follow these instructions:
- Ensure that Stata is installed and license is added. Make note of where Stata is installed in your operating system.
Common Locations:
- Windows:
C:\Program Files\Stata18\StataSE-64.exe - macOS:
/Applications/Stata/StataSE.app/Contents/MacOS/StataSE - Linux:
/usr/local/stata18/stata-se
- Copy the
.env-examplefile to.envand set theSTATA_CMDandSTATA_EDITIONenvironment variables that match your operating system and version of Stata.
# Common Stata installation paths
# Windows, Stata 18
STATA_CMD='C:\Program Files\Stata18\StataSE-64.exe'
STATA_EDITION='se'
# Windows, Stata 19
# STATA_CMD='C:\Program Files\StataNow19\StataSE-64.exe'
# macOS
# STATA_CMD='/Applications/Stata/StataSE.app/Contents/MacOS/StataSE'
# Linux or WSL
# STATA_CMD='/usr/local/stata18/stata-se'- Make sure that you have
Justinstalled
# Windows
winget install --id Casey.Just -e
# Mac/Linux using Homebrew
brew install just- Run the following command to install the necessary software
just get-startedThis will make sure that the following are installed:
uvfor python environment managementGitfor version controlGitHub CLIfor interaction with GitHubQuartofor literate programming, writing reports, generating presentationsmarkdownlint-cli2for formatting of Markdown documents (including Quarto Markdown)- Python virtual environment via
uvis added in.venv/at the root of this directory. nbstatais installed within the Python virtual environment. This will enable you to send code to Stata from VSCode and Jupyter notebooks.- Stata commands specified in
.config/stata/stata_requirements.txt
Tip
If you want to add Stata requirements, add them to .config/stata/stata_requirements.txt
and then run just stata-install-packages
- The
just get-startedcommand should set up your VS Code environment to work with Stata.
Confirm that you see the following in the .venv/etc/nbstata.conf:
stata_dir should be the same as your STATA_CMD value and edition should be
the same as STATA_EDITION.
[nbstata]
stata_dir = C:\Program Files\StataNow19
edition = se
splash = False
graph_format = png
graph_width = 5.5in
graph_height = 4in
echo = None
missing = .
browse_auto_height = TrueIf you don't see that information, add the relevant information to .venv/etc/nbstata.conf.
Alternatively, you can add the information above to ~/.config/nbstata/nbstata.conf.
See the nbstata User Guide
for more information
- Verify your Stata configuration:
# Test basic Stata access
just stata-check-installation-
Install the vscode-stata extension for VS Code
-
There are two options for testing the
nbstataintegration in VS Code:
- From VS Code, try running code in the
scripts/demo/nbstata-demo.do. - From VS Code, try running or rendering the code in
scripts/demo/nbstata-demo.qmd - From VS Code, try running or rendering the code in
scripts/demo/nbstata-demo.ipynb
In each of the cases above, make sure that you select the nbstata Jupyter Kernel
located at .venv/Scripts/python.exe (Windows) or .venv/bin/python (MacOS, Linux).
Command not found errors:
- Verify Stata path in
.envfile - Check that Stata is installed and accessible
- Ensure quotes around paths with spaces (Windows)
Permission errors (macOS/Linux):
- Use
sudowhen creating symlinks - Check file permissions on Stata executable
Batch mode issues:
- Ensure your Stata license supports batch processing
- Some Stata commands may not work in batch mode
This template follows best practices for Stata project organization:
├── data/
│ ├── raw/ # Original, immutable data files
│ ├── clean/ # Cleaned data (intermediate)
│ └── final/ # Analysis-ready datasets
├── scripts/ # Code
│ ├── demo/ # Demo scripts
│ └── do/ # Stata do-files
│ ├── 00_run.do # Master do-file
│ ├── 01_data_cleaning.do
│ ├── 02_data_preparation.do
│ ├── 03_descriptive_analysis.do
│ ├── 04_main_analysis.do
│ ├── 05_robustness_checks.do
│ └── 06_generate_figures.do
├── ado/ # User-written Stata packages
├── analysis/logs/ # Log files from Stata runs
├── outputs/
│ ├── tables/ # Regression tables (.tex files)
│ └── figures/ # Figures (.pdf files)
├── documentation/ # Project documentation
└── SConstruct # statacons workflow definition
The scripts/demo/nbstata-demo.qmd file provides a Quarto notebook example for interactive Stata analysis.
- statacons integration: Automatically tracks file dependencies and rebuilds only what's necessary
- Reproducible environments: Stata packages managed in local
ado/folder - Version control friendly: All outputs are generated, not committed
- IPA Data Standards: Follows IPA Data Cleaning Guide and Stata coding best practices
- Data Carpentry Methods: Implements research-grade programming techniques for data exploration, transformation, and combination
- Standardized coding style: Implementing IPA, Data Carpentry, DIME Analytics, and Sean Higgins guidelines
- Defensive programming: Uses assert statements and quality checks throughout
- Advanced programming: Includes loops, macros, temporary files, and modular programming
- Extended missing values: Implements IPA's .d/.o/.n/.r/.s conventions
- Code quality enforcement: Integrated stata_linter for style checking and best practices
- Reproducible package management: Requirements-based Stata package installation system
- Comprehensive logging: All Stata runs generate detailed log files
- Publication-ready outputs: Tables in LaTeX format, figures in PDF
- Place raw data in
data/raw/ - Modify
scripts/do/01_data_cleaning.dofor your data cleaning steps - Modify
scripts/do/02_data_preparation.dofor analysis sample creation
- Update analysis scripts (
03_descriptive_analysis.do,04_main_analysis.do,05_robustness_checks.do) - Modify
scripts/do/06_generate_figures.dofor your visualization needs - Run entire pipeline with
sconsor individual steps withscons [target]
For IPA staff, install the ipaplots package for branded visualizations:
net install github, from("https://haghish.github.io/github/")
github install PovertyAction/ipaplotsThe template automatically detects and uses the IPA theme when available, falling back to default schemes otherwise.
Stata lacks a built-in package manager, making reproducible environments challenging. This template provides a requirements-based system:
# Install all required packages from requirements file
just stata-install-packagesPackage Requirements File: scripts/setup/stata_requirements.txt contains a list of required packages with their installation sources:
# Format: package_name,install_source,install_command
estout,ssc,ssc install estout
reghdfe,ssc,ssc install reghdfe
ipaplots,github,github install PovertyAction/ipaplots
stata_linter,net,net install stata_linter, from(https://raw.githubusercontent.com/worldbank/stata-linter/main)
This template integrates stata_linter from the World Bank DIME team for enforcing Stata coding best practices:
# Lint all Stata do-files and generate Excel report
just lint-stata
# Lint a specific do-file
just lint-stata-file scripts/do/01_data_cleaning.do
# Check if stata_linter is installed
just stata-check-linter
# Install stata_linter (included in package requirements)
just stata-install-packagesThe linter checks for:
- Variable naming conventions
- Proper use of global macros for file paths
- Consistent indentation and spacing
- Deprecated command usage
- Best practices for loops and conditionals
Linting reports are saved to analysis/logs/stata_linter_report.xlsx with detailed feedback on code quality issues.
Create publication-ready reports that automatically include your Stata outputs:
# Generate complete analysis and report
just full-analysis-report
# Or generate report from existing outputs
just render-report
# Preview report in browser
just preview-reportThe Quarto report template automatically integrates your Stata outputs including LaTeX tables and PDF figures.
- Tables will be generated in
outputs/tables/(LaTeX format) - Figures will be generated in
outputs/figures/(PDF format, with IPA branding when available) - Log files will be saved in
analysis/logs/ - Reports will be generated in
reports/(PDF, HTML, or Typst format)
This template builds upon established best practices and tools from the development economics and data science communities:
-
IPA Data Cleaning Guide (Website): Comprehensive guide for data cleaning best practices
- Organization: Innovations for Poverty Action (IPA)
- Covers: Raw data management, variable management, dataset documentation, data aggregation
-
IPA Stata Tutorials (Website): Stata coding standards and best practices
- Organization: Innovations for Poverty Action (IPA)
- Covers: Stata syntax, data processing, coding standards
-
Data Carpentry Stata Economics (Website): Research-grade Stata programming curriculum
- Organization: Data Carpentry
- Covers: Data exploration, quality assessment, transformation, combination, programming, loops, advanced techniques
- License: CC BY 4.0
-
statacons (GitHub | Documentation): Python package for managing Stata workflows
- Authors: Brian Quistorff and colleagues
- License: MIT License
-
ipaplots (GitHub): IPA-branded Stata graphing scheme
- Authors: Ronny Condor, Kelly Montaño (IPA Peru)
- Organization: Innovations for Poverty Action
- Features: Professional visualization theme with IPA branding
-
Sean Higgins Stata Guide (GitHub): Comprehensive coding style and workflow recommendations
- Author: Sean Higgins
- License: Creative Commons
-
DIME Analytics Data Handbook (Website): World Bank DIME team coding standards
- Organization: World Bank Development Impact Evaluation (DIME)
- License: MIT License
-
World Bank Reproducible Research Repository (GitHub): Guidelines for reproducible research
- Organization: World Bank
- License: Mozilla Public License 2.0
- uv (Documentation): Fast Python package installer and resolver
- Just (GitHub): Command runner for development tasks
- Quarto (Website): Scientific and technical publishing system
just stata-full # Complete pipeline with build system
# OR use scons directly:
scons # Builds entire analysis pipeline
scons data # Builds only data cleaning/preparation
scons analysis # Builds only analysis outputs
scons figures # Builds only figures
scons -c # Clean all outputsThis template is released under the MIT License. See LICENSE for details.
While this template is MIT licensed, please respect the licenses of the constituent tools and respect the intellectual contributions of the referenced guides and best practices.