Ebola simulation part 2: outbreak reconstruction

24 January 2018 / [practicals] / #genetics #simulation #response #outbreaker2 #epicurve #reproduction number #ebola

This practical is the second part of the response to a simulated Ebola Virus Disease (EVD) outbreak taking place in the city of Ankh, Republic of Morporkia. While the first part focussed on early assessments of transmissibility, this part explores more methodological options for estimating transmissibility, and provides an introduction to outbreak reconstruction using outbreaker2.

An update on the EVD outbreak in Ankh, Republic of Morporkia

After some rather concerning preliminary results on the new EVD outbreak in the city of Ankh, Republic of Morporkia, Public Health Morporkia (PHM) has sent you updates of the linelists and contact data. This time, PHM has also obtained Whole Genome Sequences (WGS) of the Ebola virus isolated in patients. As before, you are asked to assess the situation and produce evidence-based recommendations for informing the response.

Required packages

The following packages, available on CRAN, are needed for this practical:

readxl to read .xlsx files
ggplot2 for graphics
incidence for epicurves
epicontacts for contact data visualisation
EpiEstim for time-varying reproduction number estimation
distcrete to obtain discrete time delay distributions
epitrix for some practical tricks for epidemics analysis
ape for DNA sequence analysis
outbreaker2 to reconstruct the outbreak

To install these packages, use install.packages, e.g.:

install.packages("readxl")
install.packages("ggplot2")
install.packages("incidence")
install.packages("epicontacts")
install.packages("EpiEstim")
install.packages("distcrete")
install.packages("epitrix")
install.packages("ape")
install.packages("outbreaker2")

The new data

The data update includes new linelists and contact lists:

PHM-EVD-linelist-2017-11-25.xlsx: a linelist containing case information up to the 25th November 2017
PHM-EVD-contacts-2017-11-25.xlsx: a list of contacts reported between cases up to the 25th November, where from indicates a potential source of infection, and to the recipient of the contact.

To read into R, download these files and use the function read_xlsx() from the readxl package to import the data. Each import will create a data table stored as a tibble object. Call the first one linelist, and the second one contacts. For instance, your first command line could look like:

linelist <- readxl::read_xlsx("PHM-EVD-linelist-2017-11-25.xlsx")

Note that for further analyses, you will need to make sure all dates are stored as Date objects. This can be done using as.Date:

linelist$onset <- as.Date(linelist$onset)

Once imported, the data should look like:

## linelist: one line per case
linelist
## # A tibble: 50 x 4
##    case_id onset      sex      age
##    <chr>   <date>     <chr>  <dbl>
##  1 39e9dc  2017-10-10 female    62
##  2 664549  2017-10-16 male      28
##  3 b4d8aa  2017-10-17 male      54
##  4 51883d  2017-10-18 male      57
##  5 947e40  2017-10-20 female    23
##  6 9aa197  2017-10-20 female    66
##  7 e4b0a2  2017-10-21 female    13
##  8 af0ac0  2017-10-21 male      10
##  9 185911  2017-10-21 female    34
## 10 601d2e  2017-10-22 male      11
## # ... with 40 more rows

## contacts: pairs of cases with reported contacts
contacts
## # A tibble: 36 x 2
##    from   to    
##    <chr>  <chr> 
##  1 9aa197 426b6d
##  2 39e9dc a8e9d8
##  3 c2a389 95fc1d
##  4 51883d 778316
##  5 51883d e37897
##  6 933811 99abbe
##  7 51883d 185911
##  8 b4d8aa e4b0a2
##  9 605322 1875e2
## 10 b4d8aa b5ad13
## # ... with 26 more rows

Analysis of epidemiological data

Visualising contact data

After the initial stage of the outbreak, contact tracing has been maintained but started being sparser, and exposures haven’t been reported for all cases. Despite these limitations, contacts are still a valuable source of information. Using the function make_epicontacts in the epicontacts package, create a new epicontacts object called x, specifying that contacts are directed. When plotting the data, use the arguments node_shape and shapes (see ?vis_epicontacts) to distinguish males and females. For a list of available symbols and corresponding shape code, you can type epicontacts::codeawesome.

The results should look like:

## 
## /// Epidemiological Contacts //
## 
##   // class: epicontacts
##   // 50 cases in linelist; 36 contacts;  directed 
## 
##   // linelist
## 
## # A tibble: 50 x 4
##    id     onset      sex      age
##  * <chr>  <date>     <chr>  <dbl>
##  1 39e9dc 2017-10-10 female    62
##  2 664549 2017-10-16 male      28
##  3 b4d8aa 2017-10-17 male      54
##  4 51883d 2017-10-18 male      57
##  5 947e40 2017-10-20 female    23
##  6 9aa197 2017-10-20 female    66
##  7 e4b0a2 2017-10-21 female    13
##  8 af0ac0 2017-10-21 male      10
##  9 185911 2017-10-21 female    34
## 10 601d2e 2017-10-22 male      11
## # ... with 40 more rows
## 
##   // contacts
## 
## # A tibble: 36 x 2
##    from   to    
##    <chr>  <chr> 
##  1 9aa197 426b6d
##  2 39e9dc a8e9d8
##  3 c2a389 95fc1d
##  4 51883d 778316
##  5 51883d e37897
##  6 933811 99abbe
##  7 51883d 185911
##  8 b4d8aa e4b0a2
##  9 605322 1875e2
## 10 b4d8aa b5ad13
## # ... with 26 more rows

What can you say about these contacts? How would you interpret the different clusters?

Building epicurves

Using the same approach as in the first part of the practical, use incidence (from the incidence package) to compute and plot epicurves using dates of symptom onset. As the time series is now longer, compare daily incidence to weekly incidence.

You should obtain something like:

As there are no longer time intervals with ‘zero’ incidence on the weekly epicurve, which are problematic for log-linear regression, we can try fitting a model to these data; we do so here, with i being the weekly incidence:

f <- fit(i)
f
## <incidence_fit object>
## 
## $model: regression of log-incidence over time
## 
## $info: list containing the following items:
##   $r (daily growth rate):
## [1] 0.02689489
## 
##   $r.conf (confidence interval):
##            2.5 %     97.5 %
## [1,] -0.03115559 0.08494537
## 
##   $doubling (doubling time in days):
## [1] 25.77245
## 
##   $doubling.conf (confidence interval):
##         2.5 %    97.5 %
## [1,] 8.159917 -22.24792
## 
##   $pred: data.frame of incidence predictions (7 rows, 5 columns)
plot(i, color = "#c65353", fit = f)

How would you interpret this result? What are the limitations of this analysis?

Estimating transmissibility

Repeating the same analysis as in the early stage of the outbreak, we can use get_R from the earlyR package to estimate the reproduction number:

library(earlyR)

## parameters of the serial interval from practical part1, on the dataset
## ebola_sim_clean (outbreaks package)

mu <- 11.48 # mean serial interval
sd <- 7.38 # standard deviation

daily_i <- incidence(linelist$onset)
simple_R <- get_R(daily_i, si_mean = mu, si_sd =  sd, max_R = 5)
simple_R
## 
## /// Early estimate of reproduction number (R) //
##  // class: earlyR, list
## 
##  // Maximum-Likelihood estimate of R ($R_ml):
## [1] 1.286286
## 
## 
##  // $lambda:
##   0.04089622 0.05345593 0.06190884 0.06671221 0.06852841 0.06803361...
## 
##  // $dates:
## [1] "2017-10-11" "2017-10-12" "2017-10-13" "2017-10-14" "2017-10-15"
## [6] "2017-10-16"
## ...
## 
##  // $si (serial interval):
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.41975308641975
##     scale: 4.74428571428571

## R: likelihood function
plot(simple_R)

## force of infection
plot(simple_R, "lambdas")
abline(v = as.Date("2017-11-25"))

What do you make of these results?What is the main limitation of the estimation of the reproduction number ((R)) in this analysis? What assumption does it make about the outbreak?

Estimating time-varying transmissibility

When the assumption that (R) is constant over time becomes untenable, an alternative is the estimationg of time-varying transmissibility using the instantaneous reproduction number (R_t). This approach, introduced by Cori et al. (2013), is implemented in the package EpiEstim (function EstimateR). It esimates (R_t) for a succession of sliding time windows, using the same Poisson likelihood described in the first part. In the following, we use EstimateR to estimate transmissibility for 1-week sliding time windows:

library(EpiEstim)
library(ggplot2)

days <- seq_along(daily_i$dates)
Rt <- EstimateR(as.vector(daily_i$counts), days, days + 7, method = "ParametricSI",
                   Mean.SI = mu, Std.SI = sd)
## Warning in EstimateR(as.vector(daily_i$counts), days, days + 7, method =
## "ParametricSI", : You're estimating R too early in the epidemic to get the
## desired posterior CV.
Rt <- Rt$R[days,]
head(Rt, 10)
##    T.Start T.End  Mean(R)    Std(R) Quantile.0.025(R) Quantile.0.05(R)
## 1        1     8       NA        NA                NA               NA
## 2        2     9       NA        NA                NA               NA
## 3        3    10       NA        NA                NA               NA
## 4        4    11       NA        NA                NA               NA
## 5        5    12 7.760690 2.5868968          3.548682         4.048690
## 6        6    13 7.747323 2.3359057          3.867436         4.344845
## 7        7    14 6.674902 1.9268782          3.449019         3.851537
## 8        8    15 4.773032 1.4391233          2.382680         2.676806
## 9        9    16 3.437773 1.0871193          1.648546         1.865131
## 10      10    17 2.541335 0.8471116          1.162060         1.325794
##    Quantile.0.25(R) Median(R) Quantile.0.75(R) Quantile.0.95(R)
## 1                NA        NA               NA               NA
## 2                NA        NA               NA               NA
## 3                NA        NA               NA               NA
## 4                NA        NA               NA               NA
## 5          5.896094  7.475227         9.314937        12.446983
## 6          6.070950  7.513862         9.169754        11.946526
## 7          5.294658  6.490432         7.854455        10.127781
## 8          3.740239  4.629200         5.649375         7.360110
## 9          2.655985  3.323885         4.095710         5.399097
## 10         1.930749  2.447856         3.050292         4.075920
##    Quantile.0.975(R)
## 1                 NA
## 2                 NA
## 3                 NA
## 4                 NA
## 5          13.592581
## 6          12.952366
## 7          10.947973
## 8           7.979796
## 9           5.873368
## 10          4.451060

EpiEstim is not yet integrated with other RECON packages, so adding the results to existing incidence plots takes some customisation using ggplot2. Uses the following commands to add estimates of (R_r) to the daily incidence:

names(Rt) <- gsub("[(]R[)]", "", names(Rt))
Rt <- cbind.data.frame(as.data.frame(daily_i),
                       Rt)
plot(daily_i) +
    geom_ribbon(data = Rt, fill = "#c65353", alpha = .3,
                aes(x = dates, ymin = Quantile.0.025, ymax = Quantile.0.975)) +
    geom_line(data = Rt, aes(x = dates, y = Median), col = "#c65353", alpha = .8) +
    geom_hline(yintercept = 1, linetype = 2) +
    labs(title = "Weekly Rt: median, and 95% CI")
## Warning: Removed 11 rows containing missing values (geom_path).

How would you interpret this result? What is the caveat of this representation?

Finding who infected whom

To gain a better understanding of the transmission process, we can attempt to reconstruct plausible transmission trees using the dates of symptom onsets and limited contact data. This can be achieved using outbreaker2, which provides a modular platform for outbreak reconstruction. This package extends and replaces outbreaker, which in contrast was a static implementation of a specific transmission model (Jombart et al. 2014).

Looking at Whole Genome Sequences (WGS)

WGS have been obtained for all cases in this outbreak. They are stored as a fasta PHM-EVD-WGS.fa. Download this file, save it in your working directory, and then import these data using the function read.FASTA from the ape package.

library(ape)
## dna <- read.FASTA("PHM-EVD-WGS.fa")
dna
## 50 DNA sequences in binary format stored in a list.
## 
## All sequences of same length: 18958 
## 
## Labels:
## 39e9dc
## 664549
## b4d8aa
## 51883d
## 947e40
## 9aa197
## ...
## 
## Base composition:
##     a     c     g     t 
## 0.250 0.248 0.248 0.254
identical(labels(dna), linelist$case_id) # check sequences match linelist data
## [1] TRUE

As a first exploration of the data, we derive a Neighbour-Joining tree rooted at the first case of the outbreak:

nj <- nj(dist.dna(dna, model = "N")) # NJ on nucleotide distances (model = "N")
nj
## 
## Phylogenetic tree with 50 tips and 48 internal nodes.
## 
## Tip labels:
##  39e9dc, 664549, b4d8aa, 51883d, 947e40, 9aa197, ...
## 
## Unrooted; includes branch lengths.
nj <- root(nj, 1)
plot(nj, main = "Neighbour Joining tree")
axisPhylo()

This phylogenetic tree shows the inferred evolution of the pathogen sequences. Branch length (x-axis) correspond to the number of mutations occuring between lineages (indicated by the axis at the bottom). The tree has been rooted to the index case, so that this sequence (top, left) is the “most ancient” part of the tree. Note that in such representations, distances on the y-axis are meaningless.

How would you interpret this phylogenetic tree? Many methods of outbreak reconstruction infer transmission events from phylogenies. What results would you expect here?

Building delay distributions

outbreaker2 can handle different types of dates. When dates of onset are provided, information on the generation time (delay between primary and secondary infections) and on the incubation period (delay between infection and symptom onset) can be included in the model. These delays are typically modelled as Gamma distributions, which need to be discretised in order to account for the fact that time is reported as days.

A possible approach here would be using estimates of the mean and standard deviation of the incubation period and the generation time published in the literature. From this, one would need to use epitrix to convert these parameters into shape and scale for a Gamma distribution, and then use distcrete to generate discretised distributions.

Alternatively, these parameters can be estimated from a previous, reasonably similar outbreak. PHM asks you to use the dataset ebola_sim_clean from the outbreaks package to this end. Start by extracting data on the inbucation period (i.e. delay from infection to onset) and store the output in an object called incub; results should look like:

incub <- with(ebola_sim_clean$linelist, date_of_onset - date_of_infection)
incub <- as.integer(na.omit(incub))
summary(incub)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    4.00    8.00   10.08   14.00   62.00

hist(incub, col = terrain.colors(10), border = "white",
     main = "Incubation period",
     xlab = "Days after exposure")

Then, use epicontacts to extract data on the generation time (i.e. delay between infections between infectors and infectees):

ebola_sim_contacts
## 
## /// Epidemiological Contacts //
## 
##   // class: epicontacts
##   // 5,829 cases in linelist; 3,800 contacts;  directed 
## 
##   // linelist
## 
## # A tibble: 5,829 x 11
##    id     generation date_of_infection date_of_onset date_of_hospitalisat…
##  * <chr>       <int> <date>            <date>        <date>               
##  1 d1fafd          0 NA                2014-04-07    2014-04-17           
##  2 53371b          1 2014-04-09        2014-04-15    2014-04-20           
##  3 f5c3d8          1 2014-04-18        2014-04-21    2014-04-25           
##  4 6c286a          2 NA                2014-04-27    2014-04-27           
##  5 0f58c4          2 2014-04-22        2014-04-26    2014-04-29           
##  6 49731d          0 2014-03-19        2014-04-25    2014-05-02           
##  7 f9149b          3 NA                2014-05-03    2014-05-04           
##  8 881bd4          3 2014-04-26        2014-05-01    2014-05-05           
##  9 e66fa4          2 NA                2014-04-21    2014-05-06           
## 10 20b688          3 NA                2014-05-05    2014-05-06           
## # ... with 5,819 more rows, and 6 more variables: date_of_outcome <date>,
## #   outcome <fct>, gender <fct>, hospital <fct>, lon <dbl>, lat <dbl>
## 
##   // contacts
## 
## # A tibble: 3,800 x 3
##    from   to     source 
##  * <chr>  <chr>  <fct>  
##  1 d1fafd 53371b other  
##  2 cac51e f5c3d8 funeral
##  3 f5c3d8 0f58c4 other  
##  4 0f58c4 881bd4 other  
##  5 8508df 40ae5f other  
##  6 127d83 f547d6 funeral
##  7 f5c3d8 d58402 other  
##  8 20b688 d8a13d other  
##  9 2ae019 a3c8b8 other  
## 10 20b688 974bc1 other  
## # ... with 3,790 more rows
gen_time <- get_pairwise(ebola_sim_contacts, "date_of_infection")
gen_time <- as.integer(na.omit(gen_time))
summary(gen_time)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    5.00    9.00   10.96   15.00   44.00

hist(gen_time, col = terrain.colors(10), border = "white",
     main = "Generation time",
     xlab = "Days after exposure")

We can now fit a discretised Gamma distributions to both delays. The function fit_dist_gamma in the package epitrix makes this fairly simple. We illustrate the procedure for the incubation period:

library(epitrix)
incub_fit <- fit_disc_gamma(incub)
incub_fit
## $mu
## [1] 10.57505
## 
## $cv
## [1] 0.7022628
## 
## $sd
## [1] 7.426463
## 
## $ll
## [1] -12123.24
## 
## $converged
## [1] TRUE
## 
## $distribution
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.02768577832022
##     scale: 5.21532912601657
incub_dist <- incub_fit$distribution

plot(0:50, incub_dist$d(0:50), type = "h", col = terrain.colors(51),
     xlab = "Days after exposure", ylab = "Probability",
     main = "Incubation period distribution", lend = 1, lwd = 8)

Using the same process, create a distribution for the generation time. Results should resemble:

## $mu
## [1] 11.46512
## 
## $cv
## [1] 0.6379854
## 
## $sd
## [1] 7.314581
## 
## $ll
## [1] -4364.861
## 
## $converged
## [1] TRUE
## 
## $distribution
## A discrete distribution
##   name: gamma
##   parameters:
##     shape: 2.45684936922917
##     scale: 4.66659611108255

Using the original `outbreaker` model

The original outbreaker model combined temporal information (here, dates of onset) with sequence data to infer who infected whom. Here, we use outbreaker2 to apply this model to the data.

All inputs to the new outbreaker function are prepared using dedicated functions, which make a number of checks on provided inputs and define defaults:

library(outbreaker2)

data <- outbreaker_data(dates = linelist$onset, # dates of onset
                        dna = unname(dna), # WGS; remove labels for compatibility
                        w_dens = gen_time_dist$d(1:100), # generation time distribution
                        f_dens = incub_dist$d(1:100) # incubation period distribution
                        )

We also create a configuration, which determines different aspects of the analysis, including which parameters need to be estimated, initial values of parameters, the length of the MCMC, etc.:

config <- create_config(move_kappa = FALSE, # don't look for missing cases
                        move_pi = FALSE, # don't estimate reporting
                        init_pi = 1, # set reporting to 1
                        find_import = FALSE, # don't look for additional imported cases
                        init_tree = "star" # star-like tree as starting point
                        )

We can now run the analysis. This should take a couple of minutes on modern laptops. Note the use of set.seed(0) to have identical results across different users and computers:

set.seed(0)
res_basic <- outbreaker(data = data, config = config)
res_basic
## 
## 
##  ///// outbreaker results ///
## 
## class:  outbreaker_chains data.frame
## dimensions 201 rows,  158 columns
## ancestries not shown: alpha_1 - alpha_50
## infection dates not shown: t_inf_1 - t_inf_50
## intermediate generations not shown: kappa_1 - kappa_50
## 
## /// head //
##   step       post       like    prior           mu pi eps lambda
## 1    1 -1979.3825 -1981.6850 2.302485 1.000000e-04  1 0.5   0.05
## 2   50  -812.2698  -814.5723 2.302543 4.244632e-05  1 0.5   0.05
## 3  100  -816.6804  -818.9829 2.302522 6.265153e-05  1 0.5   0.05
## 
## ...
## /// tail //
##      step      post      like    prior           mu pi eps lambda
## 199  9900 -807.7895 -810.0920 2.302535 5.039668e-05  1 0.5   0.05
## 200  9950 -810.8754 -813.1780 2.302551 3.377507e-05  1 0.5   0.05
## 201 10000 -811.5659 -813.8684 2.302522 6.311282e-05  1 0.5   0.05
plot(res_basic)

plot(res_basic, burn = 500)

The first two plots show the trace of the log-posterior densities (with, and without burnin). See ?plot.outbreaker_chains for details on available plots. Graphics worth looking at include:

plot(res_basic,  type = "alpha", burnin = 500) # ancestries

plot(res_basic,  type = "t_inf", burnin = 500) # infection dates

plot(res_basic, "mu", burn = 500, type = "density") # mutation rate

p <- plot(res_basic,  type = "network", burnin = 500, min_support = .05) # transmission trees
## p

As a further help for interpretation, you can derive a consensus tree from the posterior samples of trees using summary. Look in particular at the support column, and compare the results to the contact data.

smry_basic <- summary(res_basic)
head(smry_basic$tree)
##   from to time   support generations
## 1   NA  1   -8 0.2537313          NA
## 2    1  2    0 0.4427861           1
## 3    8  3    2 0.3283582           1
## 4    9  4    3 0.3432836           1
## 5   10  5    0 0.5124378          NA
## 6    4  6    5 0.3383085           1
tail(smry_basic$tree)
##    from to time   support generations
## 45   48 45   29 0.2636816           1
## 46   49 46   31 0.3830846           1
## 47   20 47   26 0.5174129           1
## 48   21 48   27 0.1741294           1
## 49   30 49   31 0.3880597           1
## 50   48 50   34 0.1641791           1
hist(smry_basic$tree$support, col = "grey", border = "white",
     main = "Consensus ancestry: support", xlim = c(0,1))

How would you interpret the results? Is this what you would have expected?

As a point of comparison, repeat the same analysis using temporal data only, and plot a graph of ancestries (type = "alpha"); you should obtain something along the lines of:

set.seed(0)

data <- outbreaker_data(dates = linelist$onset, # dates of onset
                        w_dens = gen_time_dist$d(1:100), # generation time distribution
                        f_dens = incub_dist$d(1:100) # incubation period distribution
                        )
res_time <- outbreaker(data = data, config = config)
plot(res_time, type = "alpha", burn = 500)

What is the usefulness of temporal and genetic data for outbreak reconstruction? What other data would you ideally include?

Adding contact data to the reconstruction process

Contact data currently contains case labels. While epicontacts objects will soon be accepted as inputs in outbreaker2, for now we need to operate some minor transformations to define contacts using cases indices rather than labels:

ctd <- matrix(match(unlist(x$contacts), linelist$case_id), ncol = 2)
head(ctd)
##      [,1] [,2]
## [1,]    6   28
## [2,]    1   15
## [3,]   30   46
## [4,]    4   23
## [5,]    4   13
## [6,]   41   43
dim(ctd)
## [1] 36  2

All inputs to the outbreaker function are prepared using dedicated functions, which make a number of checks on provided inputs and define defaults:

data <- outbreaker_data(dates = linelist$onset, # dates of onset
                        dna = unname(dna), # dna sequences
                        ctd = ctd, # contact data
                        w_dens = gen_time_dist$d(1:100), # generation time distribution
                        f_dens = incub_dist$d(1:100) # incubation period distribution
                        )

We are now ready to run the analysis. This may take a couple of minutes, depending on your computer:

set.seed(0)
res_full <- outbreaker(data = data, config = config)
res_full
## 
## 
##  ///// outbreaker results ///
## 
## class:  outbreaker_chains data.frame
## dimensions 201 rows,  158 columns
## ancestries not shown: alpha_1 - alpha_50
## infection dates not shown: t_inf_1 - t_inf_50
## intermediate generations not shown: kappa_1 - kappa_50
## 
## /// head //
##   step       post       like    prior           mu pi       eps
## 1    1 -2153.0273 -2155.3298 2.302485 1.000000e-04  1 0.5000000
## 2   50  -897.6124  -899.9149 2.302517 6.776465e-05  1 0.5889257
## 3  100  -861.1298  -863.4323 2.302527 5.788386e-05  1 0.7036900
##        lambda
## 1 0.050000000
## 2 0.016343896
## 3 0.003869504
## 
## ...
## /// tail //
##      step      post      like    prior           mu pi       eps
## 199  9900 -838.7367 -841.0392 2.302540 4.526663e-05  1 0.6064171
## 200  9950 -833.8983 -836.2008 2.302538 4.745629e-05  1 0.7066745
## 201 10000 -835.0767 -837.3792 2.302537 4.802969e-05  1 0.7478403
##           lambda
## 199 0.0008513904
## 200 0.0008513904
## 201 0.0008513904

Produce graphics as in the previous model. Assess convergence, choose an appropriate burnin, visualise ancestries and the infection timelines:

How would you interpret the results?

Derive a consensus tree using summary, and make a new epicontacts object, using the previous linelist, to visualise the consensus tree with meta-information:

smry_full <- summary(res_full)
head(smry_full$tree)
##   from to time   support generations
## 1   NA  1   -6        NA          NA
## 2    1  2    2 0.6517413           1
## 3    1  3    2 1.0000000           1
## 4    1  4    1 0.7164179           1
## 5    1  5    1 0.9950249           1
## 6    4  6    6 0.6666667           1

smry_full$tree$support <- round(smry_full$tree$support, 2)

linelist$id <- 1:nrow(linelist) # add case index to linelist
cons_tree <- make_epicontacts(linelist, smry_full$tree[-1, ], id = "id",
                              from = 1, to = 2, directed = TRUE)

In the following, we add age class information to the linelist of cons_tree, and create color palettes which will be used to display information on the final graph:

library(visNetwork)

support_pal <- colorRampPalette(
    c("#918D98", "#645877", "#423359", "#281449", "#1A0340")
)

age_pal <- colorRampPalette(
    c("#3288BD", "#ABDDA4", "#FDAE61", "#D53E4F")
)

cons_tree$linelist$age_class <- cut(cons_tree$linelist$age,
                                    breaks = c(0, 10, 20, 30, 40, 90),
                                    labels = c("0-10", "11-20", "21-30", "31-40", "41+" ))

Looking carefully at the documentation of vis_epicontacts, try to reproduce the final consensus tree below:

What are your conclusions? What are the main drivers of this outbreak? What recommendations would you make to further improve the response?

About this document

Contributors

Thibaut Jombart: initial version

Contributions are welcome via pull requests. The source file is hosted on github.

Legal stuff

References

Cori, Anne, Neil M Ferguson, Christophe Fraser, and Simon Cauchemez. 2013. “A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics.” Am. J. Epidemiol. 178 (9): 1505–12.

Jombart, Thibaut, Anne Cori, Xavier Didelot, Simon Cauchemez, Christophe Fraser, and Neil Ferguson. 2014. “Bayesian Reconstruction of Disease Outbreaks by Combining Epidemiologic and Genomic Data.” PLoS Comput. Biol. 10 (1): e1003457.

Ebola simulation part 2: outbreak reconstruction

An update on the EVD outbreak in Ankh, Republic of Morporkia

Required packages

The new data

Analysis of epidemiological data

Visualising contact data

Building epicurves

Estimating transmissibility

Estimating time-varying transmissibility

Finding who infected whom

Looking at Whole Genome Sequences (WGS)

Building delay distributions

Using the original `outbreaker` model

Adding contact data to the reconstruction process

About this document

Contributors

Legal stuff

References

Is this outbreak over?

Introduction to phylogenetics

An update on the EVD outbreak in Ankh, Republic of Morporkia

Required packages

The new data

Analysis of epidemiological data

Visualising contact data

Building epicurves

Estimating transmissibility

Estimating time-varying transmissibility

Finding who infected whom

Looking at Whole Genome Sequences (WGS)

Building delay distributions

Using the original outbreaker model

Adding contact data to the reconstruction process

About this document

Contributors

Legal stuff

References

Using the original `outbreaker` model