Skip to content

Alison-Sousa/paper-erp-ofdi-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 ERP-OFDI Model - Chinese Investments

🌟 About the Project

Pipeline to analyze Chinese outbound investment (OFDI) and the impact of the ERP (Ecological Redlines Policy) implemented in 2017.

Data source: Dr. Derek Scissors, PhD - Stanford University / American Enterprise Institute

Documentation

  • Pipeline and Implementation:

  • Metrics Interpretation:

    • itpt.pdf It covers COFDI-ERP analysis with regression, classification, time series, causality, and other metrics.

🔄 Project Structure

📁 data/ → Stage 1: Data Load

Files: data.csv (transactions 2005-2024) + naturalearth_lowres.zip (maps)

  • Process: Load CSV → Clean data → Convert numeric columns → Filter valid years
  • Result: DataFrame organized and ready for analysis

📁 code/ → Main Run

Files: code.py (orchestration) + requirements.txt (dependencies)

  • Command: python .\code\code.py
  • Function: Runs all 11 steps automatically (Python)

📁 results/ → Outputs

🎨 r_style_plots/ → Stage 3: Visualizations

Generated plots:

  • r_style_world_maps.png → World maps colored by investment
  • r_style_investment_by_sector.png → Sector evolution in 3 phases
  • r_style_greenfield_donuts.png → Donut charts Greenfield vs M&A
  • python_style_boxplot_greenfield.png → Comparative statistical analysis

📁 eda/ → Stage 4: Exploratory Analysis

Files:

  • summary_statistics.csv → Means, medians, standard deviations
  • eda_target_distribution_log_vs_raw.png → Value distribution
  • eda_total_investment_over_time.png → Full time series
  • eda_h1_sector_shift_summary.csv → Sector shifts after ERP

📁 phase_analysis/ → Stage 5: Phase Analysis GG

Files:

  • summary_interaction_ols_by_phase.csv → Interaction effects
  • summary_mediation_ols_by_phase.csv → Mediation effects
  • Phases: GG 1.0 (2005-2012) | GG 2.0 (2013-2016) | GG 3.0+ERP (2017-2024)

📁 models/regression/ → Stage 6: Predictive Models

Files:

  • regression_models_summary_otimizada.csv → Performance XGBoost vs LightGBM
  • Metrics: MAPE | Log RMSE
  • Target: Predict log_Valor_USD (investment value)

📁 models/classification/ → Stage 7: Binary Classification

Files:

  • classification_models_summary_otimizada.csv → AUC-ROC | Accuracy
  • Target: Alvo_Adaptativo (1=high value, 0=low value)
  • Models: XGBoost, LightGBM, CatBoost with hyperparameter tuning

📁 models/timeseries/ → Stage 8: Time Series

Files:

  • arima_forecast.png → ARIMA forecasts
  • prophet_forecast.png → Facebook Prophet forecasts
  • prophet_forecast_data.csv → 5-year projection data
  • Goal: Predict future investment trends

📁 models/causal/ → Stage 9: Causal Analysis

Files:

  • dml_summary_att.csv → ERP causal effect via Double Machine Learning
  • markov_prob.png → Regime change probabilities
  • markov_switching_summary.txt → Regime model details
  • Question: Did the ERP policy cause changes in investments?

📁 models/shap/ → Stage 10: Interpretability

Files:

  • shap_summary_plot.png → Impact of each variable on the model
  • shap_importance_plot.png → Feature importance
  • Key variables: valor_roll_mean_3 (moving average) | Sector | Region

🛠️ Stage 2: Feature Engineering

Created features:

  • valor_roll_mean_2/3 → Moving averages over 2/3 periods
  • post_ERP → Post-2017 flag
  • Fase_GG → Three Going Global phases
  • Alvo_Binario → High/low value classification
  • policy_interaction → Time × policy interaction

📋 Summary of the 11 Automatic Steps (Python)

  1. Data Load → Cleaning and validation
  2. Feature Engineering → Create ML variables
  3. Visualizations → 20+ professional plots (via code.py and code.R)
  4. EDA → Statistical exploratory analysis
  5. GG Phase Analysis → OLS models by period
  6. Regression → Predict investment value (XGBoost/LightGBM)
  7. Classification → Identify high value (AUC-ROC)
  8. Time Series → ARIMA & Prophet
  9. Causal Models → Double ML & Markov Switching
  10. SHAP → Model interpretability
  11. Consolidation → Save all results

🚀 Execution

# 1. Install Python dependencies
python -m pip install -r requirements.txt

# 2. Run Python pipeline
python ./code/code.py

# 3. Run R code for advanced plots
Rscript ./code/code.R

# 4. View results
#    Python: ./results/
#    R: ./results/R/