Skip to content

Example: EpiEstim R package

robin-thompson edited this page Mar 6, 2019 · 35 revisions

In our manuscript (cited at the bottom of this page), analyses are conducted using the EpiEstim R software package. As an example of how this can be done, here we provide example R code for conducting the analysis underlying Fig. 2 of that paper.

Downloading the EpiEstim package

First, download the EpiEstim R package by typing in the following in R:

install.packages("devtools")
library(devtools)
install_github("annecori/EpiEstim", force = TRUE)
library(EpiEstim)

Loading the data and estimating the reproduction number

Now, save the files 'DataS1.csv' and 'DataS2.csv' - both available with the paper - on your computer, and navigate in R to that folder by typing

setwd('INSERT FILE PATH HERE')

Now load the data files by typing

disease_incidence_data <- read.csv("DataS1.csv", header = FALSE)
serial_interval_data <- read.csv("DataS2.csv", header = FALSE)

Set the column names of the serial interval data and convert the serial interval data into a data frame

names <- c("EL", "ER", "SL", "SR")
colnames(serial_interval_data) <- names
serial_interval_data <- as.data.frame(serial_interval_data)

Now call the EpiEstim function that estimates R using the data.

R_estimate <- estimate_r(disease_incidence_data, si_data = serial_interval_data, method="si_from_data", config=list(t_start=2:9, t_end=7:14, n1 = 500, n2 = 100, seed = 1, mcmc_control=list(init.pars=NULL, burnin=3000, thin=10, seed=1), si_parametric_distr = "off1G", plot=TRUE))

The terms and parameters we have included here are as follows:

  • disease_incidence_data: This tells R which data frame contains the serial interval data.

  • serial_interval_data <- as.data.frame(serial_interval_data): This tells R which data frame contains the serial interval data.

  • method="si_from_data": This tells EpiEstim that the serial interval data that we are inputting consists of interval-censored data describing timings of symptom appearance in known infector-recipient pairs.

  • t_start=2:9: This is a vector taking values 2,3,...,9, which are the start times of the intervals in which the reproduction number is estimated.

  • t_end=7:14: The equivalent vector but with end times of the intervals in which R is estimated.

  • n1 = 500, n2 = 100: These parameters determine the number of steps in the MCMC estimation of the serial interval, and the number of estimates to be drawn from those estimates in building an estimate of the reproduction number.

  • seed = 1: This sets a seed used for the reproduction number estimation, so that results are reproducible exactly.

  • mcmc_control=list(init.pars=NULL, burnin=3000, thin=10, seed=1): This sets the parameters for the MCMC estimation of the serial interval.

  • si_parametric_distr = "off1G": This specifies a gamma distribution offset by one day for the serial interval.

  • plot=TRUE: This tells EpiEstim that we wish to see a plot of our results.

Output

The output of the above code is then the resulting estimates of the reproduction number throughout the outbreak, which should look like this:

example

Clone this wiki locally