-
Notifications
You must be signed in to change notification settings - Fork 13
Example: EpiEstim R package
In our manuscript (cited at the bottom of this page), analyses are conducted using the EpiEstim R software package. As an example of how this can be done, here we provide example R code for conducting the analysis underlying Fig. 2 of that paper.
First, download the EpiEstim R package by typing in the following in R:
install.packages("devtools")
library(devtools)
install_github("annecori/EpiEstim", force = TRUE)
library(EpiEstim)
Now, save the files 'DataS1.csv' and 'DataS2.csv' - both available with the paper - on your computer, and navigate in R to that folder by typing
setwd('INSERT FILE PATH HERE')
Now load the data files by typing
disease_incidence_data <- read.csv("DataS1.csv", header = FALSE)
serial_interval_data <- read.csv("DataS2.csv", header = FALSE)
Set the column names of the serial interval data and convert the serial interval data into a data frame
names <- c("EL", "ER", "SL", "SR")
colnames(serial_interval_data) <- names
serial_interval_data <- as.data.frame(serial_interval_data)
Now call the EpiEstim function that estimates R using the data.
R_estimate <- estimate_r(disease_incidence_data, si_data = serial_interval_data, method="si_from_data", config=list(t_start=2:9, t_end=7:14, n1 = 500, n2 = 100, seed = 1, mcmc_control=list(init.pars=NULL, burnin=3000, thin=10, seed=1), si_parametric_distr = "off1G", plot=TRUE))
The terms and parameters we have included here are as follows:
-
disease_incidence_data: This tells R which data frame contains the serial interval data.
-
serial_interval_data <- as.data.frame(serial_interval_data): This tells R which data frame contains the serial interval data.
-
method="si_from_data": This tells EpiEstim that the serial interval data that we are inputting consists of interval-censored data describing timings of symptom appearance in known infector-recipient pairs.
-
t_start=2:9: This is a vector taking values 2,3,...,9, which are the start times of the intervals in which the reproduction number is estimated.
-
t_end=7:14: The equivalent vector but with end times of the intervals in which R is estimated.
-
n1 = 500, n2 = 100: These parameters determine the number of steps in the MCMC estimation of the serial interval, and the number of estimates to be drawn from those estimates in building an estimate of the reproduction number.
-
seed = 1: This sets a seed used for the reproduction number estimation, so that results are reproducible exactly.
-
mcmc_control=list(init.pars=NULL, burnin=3000, thin=10, seed=1): This sets the parameters for the MCMC estimation of the serial interval.
-
si_parametric_distr = "off1G": This specifies a gamma distribution offset by one day for the serial interval.
-
plot=TRUE: This tells EpiEstim that we wish to see a plot of our results.
The output of the above code is then the resulting estimates of the reproduction number throughout the outbreak, which should look like this:

The authors request users to cite the original publication when referencing EpiEstim App, the format or results generated from it.
Thompson RN, Stockwin JE, van Gaalen RD, Polonsky JA, et al. Improved inference of time-varying reproduction numbers during infectious disease outbreaks. Epidemics (2019).