易翊翼
20241011
This is an empirical research project template designed to provide a standardized project structure for:
- Version control
- Code synchronization between local machine and HPC
- Decoupling of code and data
- Separation of build and analysis phases
The template structure follows the Guide by Matthew Gentzkow and Jesse M. Shapiro and includes useful config.do and config.py files for setting up paths and packages in Stata and Python.
├── analysis # Analysis phase
│ ├── code # Analysis code
│ └── data
│ ├── input←←←←←| # Panel data for descriptive stats and regressions(4)
| | ↓ |
│ └── output | # Generated tables and figures (5)
├── build | # Data construction phase
│ ├── code | # Data processing code
│ └── data |
│ ├── raw | # Raw data (1)
| | ↓ |
│ ├── temp | # Temporary files, merge keys, etc. (2)
| | ↓ |
│ └── processed→| # Processed databases (3)
├── README.md # Project documentation
├── README.py # Directory tree generator
└── resource # Related papers and materials
build/data/raw: Store raw data (read-only)build/data/processed: Store processed databuild/data/temp: Store intermediate filesanalysis/data/input: Store analysis-ready dataanalysis/data/output: Store analysis results
build/code: Data cleaning and construction codeanalysis/code: Analysis code- Each code file should have clear documentation
- Use
.gitignorefor large data files - Use
.gitkeepto maintain empty directories - Regular code commits
- Use
config.pyfor Python paths and packages - Use
config.dofor Stata paths and packages
-
Data Security
- Don't commit sensitive data
- Use
.gitignorefor large files
-
Code Standards
- Keep code clean and documented
- Use meaningful names
- Add appropriate comments
-
Performance
- Use chunking for large datasets
- Choose appropriate data structures
-
Collaboration
- Regular code sync
- Keep documentation updated
- Follow project standards
- Regular dependency updates
- Documentation maintenance
- Temporary file cleanup
- Data backup