Pipeline to analyze Chinese outbound investment (OFDI) and the impact of the ERP (Ecological Redlines Policy) implemented in 2017.
Data source: Dr. Derek Scissors, PhD - Stanford University / American Enterprise Institute
Documentation
-
Pipeline and Implementation:
docsPython.pdfdocsR.pdfThey include full details about implementation, method, and technical specs.
-
Metrics Interpretation:
itpt.pdfIt covers COFDI-ERP analysis with regression, classification, time series, causality, and other metrics.
Files: data.csv (transactions 2005-2024) + naturalearth_lowres.zip (maps)
- Process: Load CSV → Clean data → Convert numeric columns → Filter valid years
- Result: DataFrame organized and ready for analysis
Files: code.py (orchestration) + requirements.txt (dependencies)
- Command:
python .\code\code.py - Function: Runs all 11 steps automatically (Python)
Generated plots:
r_style_world_maps.png→ World maps colored by investmentr_style_investment_by_sector.png→ Sector evolution in 3 phasesr_style_greenfield_donuts.png→ Donut charts Greenfield vs M&Apython_style_boxplot_greenfield.png→ Comparative statistical analysis
Files:
summary_statistics.csv→ Means, medians, standard deviationseda_target_distribution_log_vs_raw.png→ Value distributioneda_total_investment_over_time.png→ Full time serieseda_h1_sector_shift_summary.csv→ Sector shifts after ERP
Files:
summary_interaction_ols_by_phase.csv→ Interaction effectssummary_mediation_ols_by_phase.csv→ Mediation effects- Phases: GG 1.0 (2005-2012) | GG 2.0 (2013-2016) | GG 3.0+ERP (2017-2024)
Files:
regression_models_summary_otimizada.csv→ Performance XGBoost vs LightGBM- Metrics: MAPE | Log RMSE
- Target: Predict
log_Valor_USD(investment value)
Files:
classification_models_summary_otimizada.csv→ AUC-ROC | Accuracy- Target:
Alvo_Adaptativo(1=high value, 0=low value) - Models: XGBoost, LightGBM, CatBoost with hyperparameter tuning
Files:
arima_forecast.png→ ARIMA forecastsprophet_forecast.png→ Facebook Prophet forecastsprophet_forecast_data.csv→ 5-year projection data- Goal: Predict future investment trends
Files:
dml_summary_att.csv→ ERP causal effect via Double Machine Learningmarkov_prob.png→ Regime change probabilitiesmarkov_switching_summary.txt→ Regime model details- Question: Did the ERP policy cause changes in investments?
Files:
shap_summary_plot.png→ Impact of each variable on the modelshap_importance_plot.png→ Feature importance- Key variables:
valor_roll_mean_3(moving average) |Sector|Region
Created features:
valor_roll_mean_2/3→ Moving averages over 2/3 periodspost_ERP→ Post-2017 flagFase_GG→ Three Going Global phasesAlvo_Binario→ High/low value classificationpolicy_interaction→ Time × policy interaction
- Data Load → Cleaning and validation
- Feature Engineering → Create ML variables
- Visualizations → 20+ professional plots (via
code.pyandcode.R) - EDA → Statistical exploratory analysis
- GG Phase Analysis → OLS models by period
- Regression → Predict investment value (XGBoost/LightGBM)
- Classification → Identify high value (AUC-ROC)
- Time Series → ARIMA & Prophet
- Causal Models → Double ML & Markov Switching
- SHAP → Model interpretability
- Consolidation → Save all results
# 1. Install Python dependencies
python -m pip install -r requirements.txt
# 2. Run Python pipeline
python ./code/code.py
# 3. Run R code for advanced plots
Rscript ./code/code.R
# 4. View results
# Python: ./results/
# R: ./results/R/