Skip to content

MathildeChen/SOYBEAN_PRED_COMP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Comparison of methods to aggregate climate data to predict crop yield: an application to soybean

Summary

This repository contains scripts supporting a paper aiming to identify the best data-driven approach to predict soybean yields from climate inputs and to examine the impact of climate data aggregation method on predictive performances. These analyses were first conducted at the global scale and then at the scale of a single countries, in order to examine the robustness of the conclusions. The country studied are the United States of America and Brazil, two major areas producing soybean.

All analyses were done using R version 4.2.2.

Packages required:

  • for data management: tidyverse, stringr, lubridate, CCMHr
  • for graphics: cowplot (plot arrangement), wesanderson and RColorBrewer (color palettes)
  • for spatial data management and maps production: terra, raster, rnaturalearth, rnaturalearthdata, sf, sp, rworldmap, rmapshaper, tidygeocoder
  • for parallelization of the analyses: parallel, doParallel, foreach
  • for random forest models fitting: ranger

Packages specific to reduction dimension:

Analyses step:

Step 0: Home-made functions used to derive the different climate predictors

Step 1: Data pre-processing and training dataset constitution

The data (not provided in this repository) are from the global dataset of historical yields for major crops (GDHY, Iisumi and Sakai, 2020), ERA5-land database (Hersbach et al., 2023), and SPAM dataset on irrigation practices (Yu et al., 2020) which are all freely availble (see references below). The data were on 3447 sites worldwide from 1981 to 2016.

The scripts used to perform this step are:

  • 01_1_Yield_data.R: selection of the sites with enought soybean production (>1% of the surface dedicated to soybean) and those with 0 soybean (located in regions with unfavorable conditions for crop production)
  • 01_2_Climate_data.R: derivation of daily, monthly, and seasonaly climate variables for: minimum and maximum temperatures; mean precipitation; solar radiation; vapor pressure deficit; evapotranspiration.
  • 01_Dataset_preparation.R: merging datasets together

References and acces to the dataset:

Step 2: Deriving the different climate predictors

The temporal resolutions derived are

  • 'Daily' = Cumulative daily values over the growing season were computed for each climate variable.
  • 'Monthly' = Montly average of non-cumulative daily climate data
  • 'Seasonal' = Average over the entire soybean growing season of non-cumulative daily climate data.
  • 'Standardized seasonal' = Average of all rescaled variables (i.e., each climate variable had a mean of zero and a standard deviation of one).

The five different dimension-reduction techniques applied to cumulative daily data and monthly averages were:

  • principal componant analysis (PCA),
  • functional principal componant analysis (FPCA),
  • multivariate principal componant analysis (MFPCA),
  • partial least square regression (PLSR),
  • functional partial least square regression (FPLSR)

See supplementary material of the publication for methodological and computational details.

These steps are performed using the following scripts:

  • 02_Dimension_reduction.R: uses the functions from the 00_Functions_dimensions_reduction.R script to derive the different scores at the global scale, then in the US, then in Brazil; this step generally takes very long to run, so I parallelized the analyses to run different sets of computation (e.g., deriving all the FPCA scores, then all MFPCA scores etc.)
  • 02_Dimension_reduction_check_recomputation.R: script used to check that the scores were correctly computed;
  • 02_Dimension_reduction_merge_scores.R: merge the scores produced from several round of analyses (due to analyses parallelization);

Step 3: Models development and comparison

In this step we simultaneously evaluate:

(i) the impact of the temporal resolution of climate data (i.e., daily, monthly, seasonal, or standardized seasonal),

(ii) a large range of dimension reduction techniques to aggregate climate data (i.e., PCA, FPCA, MFPCA, PLSR, FPLSR, or no reduction)

(iii) modeling techniques to predict soybean yields from climate data (multiple linear regression or random forest).

Models' prediction accuracy were evaluated using two metrics were used (root mean square error and an equivalent of R², the model efficiency) through two cross-validation procedure unsuring good model transferability in time and in space.

This is performed in the:

  • 03_1_Models_prediction.R script; the partial dependence for each predictor included in the models are produced in the 03_4_Models_pdp.R script;
  • 03_2_Models_prediction_geo_years_cross_validation.R script is used to perform the cross-validation based on sites' location and years. See more details in the Supplementary data of the paper.

Step 4: Sensitivity analysis

The models development and comparison was also performed at the scale of the US and then at the scale of Brazil using the 04_Models_application_climate_change.R script.

Step 5: Figures production and sensitivity analyses

These scripts listed below produce the figures and supplementary figures of the paper:

  • 05_Figures.R
  • 05_Figure_reconstruction_variables.R, 05_Figure_reconstruction_variables_table.R, and 05_Figure_reconstruction_variables_fplsr.R: produce the data behing the Figure 3 of the paper, i.e., how perform the different reduction dimension techniques to re-estimate the initial data.
  • 05_Supp_Figures.R: supplementary figures

Authors:

Affiliations:

$1$ Université Paris-Saclay, INRAE, AgroParisTech, UMR MIA PS, 91120 Palaiseau, France

$2$ CIRAD, UMR PHIM, F-34398 Montpellier, France

$3$ PHIM, Univ Montpellier, CIRAD, INRAE, Institut Agro, IRD, Montpellier, France

$4$ Université Paris-Saclay, AgroParisTech, INRAE, UMR Agronomie, 91120 Palaiseau, France

Related publication

Published 3 May 2024 • © 2024 The Author(s). Published by IOP Publishing Ltd

Environmental Research Letters, Volume 19, Number 5

Citation: Mathilde Chen et al 2024 Environ. Res. Lett. 19 054049

DOI 10.1088/1748-9326/ad42b5

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages