Molecular Clock Dating of Influenza H3N2

Introduction

treedater fits a strict or relaxed molecular clock to a phylogenetic tree and estimates evolutionary rates and times of common ancestry. The calendar time of each sample must be specified (possibly with bounds of uncertainty) and the length of the sequences used to estimate the tree.

treedater uses heuristic search to optimise the TMRCAs of a phylogeny and the substitution rate. An uncorrelated relaxed molecular clock accounts for rate variation between lineages of the phylogeny which is parameterised using a Gamma-Poisson mixture model.

To cite:

The most basic usage is

dater( tre, sts, s)

where

  • tre is an ape::phylo phylogeny,
  • sts is a named vector of sample times for each tip in tre
  • s is the length of the genetic sequences used to estimate tre

Invoking treedater from the command line

You can also use treedater from the command line without starting R using the tdcl script:

./tdcl -h
Usage: ./tdcl [-[-help|h] [<logical>]] [-[-treefn|t] <character>] [-[-samplefn|s] <character>] [-[-sequenceLength|l] <double>] [-[-output|o] [<character>]]

-t <file> : file name of tree in newick format  
-s  <file> : should be a comma-separated-value file with sample times in format <taxon-id,sample-time> and no header
-l <length> :  the integer length of sequences in alignment used to construct the tree 
-o <file>: name of file for saving output 

Note that you may need to modify the first line of the tdcl script with the correct path to Rscript or littler.

Influenza H3N2 HA

This data set comprises 177 HA sequences collected over 35 years worldwide with known date of sampling. We estimated a maximum likelihood tree using iqtree. We will use the sample dates and ML tree to fit a molecular clock and estimate a dated phylogeny. First, load the tree (any method can be used to load a phylogeny into ape::phylo format):

require(treedater)
#> Loading required package: treedater
#> Loading required package: ape
#> Loading required package: limSolve
(tre <- ape::read.tree( system.file( 'extdata', 'flu_h3n2_final_small.treefile', package='treedater') ))
#> 
#> Phylogenetic tree with 177 tips and 175 internal nodes.
#> 
#> Tip labels:
#>   HM628693_2010-03-03, KC883039_2011-04-04, KC882820_2011-01-04, GQ983548_2009-06-22, KF790184_2012-10-05, JX978746_2012-03-19, ...
#> 
#> Unrooted; includes branch lengths.

Note that this tree does not have a root, and in the process of fitting a molecular clock, we will estimate the best root location.

seqlen <- 1698 # the length of the HA sequences used to reconstruct the phylogeny

To fit the molecular clock, we will need the sample time for each lineage. Note that the date of sampling is incorporated into the name of each lineage, which is common in viral phylogenetics studies. The package includes a convenient function for extracting these dates:

sts <- sampleYearsFromLabels( tre$tip.label, delimiter='_' )
head(sts)
#> HM628693_2010-03-03 KC883039_2011-04-04 KC882820_2011-01-04 GQ983548_2009-06-22 
#>            2010.167            2011.255            2011.008            2009.471 
#> KF790184_2012-10-05 JX978746_2012-03-19 
#>            2012.760            2012.213

How are samples distributed through time?

hist( sts , main = 'Time of sequence sampling') 

The basic usage of the treedater algorithm is as follows:

dtr <- dater( tre , sts, seqlen, clock = 'strict' )
#> Note: Initial guess of substitution rate not provided. Will attempt to guess starting conditions. Provide initial guesses of the rate using *omega0* parameter. 
#> Note: Minimum temporal branch length  (*minblen*) set to 0.0192816582316329. Increase *minblen* in the event of convergence failures. 
#> Tree is not rooted. Searching for best root position. Increase searchRoot to try harder.
#> Initial guesses of substitution rate: 0.00398896175787566,0.003999851152776,0.00432331974753195,0.00435051924886833,0.00465767773718823,0.00470118734496067 
#> Initial guesses of substitution rate: 0.00392634874833395,0.003999851152776,0.00425578014817497,0.00435051924886833,0.00458521154801598,0.00470118734496067 
#> Initial guesses of substitution rate: 0.00399105384086252,0.003999851152776,0.00432645068064694,0.00435051924886833,0.00466184752043137,0.00470118734496067 
#> Initial guesses of substitution rate: 0.00392233594728138,0.003999851152776,0.00425285389037382,0.00435051924886833,0.00458337183346626,0.00470118734496067 
#> Initial guesses of substitution rate: 0.00389455195669964,0.003999851152776,0.00422342495830629,0.00435051924886833,0.00455229795991293,0.00470118734496067 
#> 
#> NOTE: The p values for lineage clock rates show at least one outlying value after adjusting for multiple-testing.  This indicates a poor fit to the data for at least a portion of the phylogenetic tree. To visualize the distribution p-values, use *goodnessOfFitPlot*. 
#>      
#> 
#> The following steps may help to fix or alleviate common problems: 
#> * Check that the vector of sample times is correctly named and that the units are correct. 
#> * If passing a rooted tree, make sure that the root position was chosen correctly, or estimate the root position by passing an unrooted tree (e.g. pass ape::unroot(tree))
#> * The root position may be poorly estimated. Try increasing the _searchRoot_ parameter in order to test more lineages as potential root positions. 
#> * The model may be fitted by a relaxed or strict molecular clock. Try changing the _clock_ parameter 
#> * A poor fit may be due to a small number of lineages with unusual / outlying branch lengths which can occur due to sequencing error or poor alignment. Try the *outlierTips* command to identify and remove these lineages. 
#> * Check that there is adequate variance in sample times in order to estimate a molecular clock by doing a root-to-tip regression. Try the *rootToTipRegressionPlot* command. If the clock rate can not be reliably estimated, you can fix the value to a range using the _meanRateLimits_ option which would estimate a time tree given the previous estimate of clock rates.
dtr
#> 
#> NOTE: The p values for lineage clock rates show at least one outlying value after adjusting for multiple-testing.  This indicates a poor fit to the data for at least a portion of the phylogenetic tree. To visualize the distribution p-values, use *goodnessOfFitPlot*. 
#>      
#> 
#> The following steps may help to fix or alleviate common problems: 
#> * Check that the vector of sample times is correctly named and that the units are correct. 
#> * If passing a rooted tree, make sure that the root position was chosen correctly, or estimate the root position by passing an unrooted tree (e.g. pass ape::unroot(tree))
#> * The root position may be poorly estimated. Try increasing the _searchRoot_ parameter in order to test more lineages as potential root positions. 
#> * The model may be fitted by a relaxed or strict molecular clock. Try changing the _clock_ parameter 
#> * A poor fit may be due to a small number of lineages with unusual / outlying branch lengths which can occur due to sequencing error or poor alignment. Try the *outlierTips* command to identify and remove these lineages. 
#> * Check that there is adequate variance in sample times in order to estimate a molecular clock by doing a root-to-tip regression. Try the *rootToTipRegressionPlot* command. If the clock rate can not be reliably estimated, you can fix the value to a range using the _meanRateLimits_ option which would estimate a time tree given the previous estimate of clock rates. 
#> 
#> Phylogenetic tree with 177 tips and 176 internal nodes.
#> 
#> Tip labels:
#>   HM628693_2010-03-03, KC883039_2011-04-04, KC882820_2011-01-04, GQ983548_2009-06-22, KF790184_2012-10-05, JX978746_2012-03-19, ...
#> 
#> Rooted; includes branch lengths.
#> 
#>  Time of common ancestor 
#> 1980.50299575072 
#> 
#>  Time to common ancestor (before most recent sample) 
#> 34.54083986572 
#> 
#>  Weighted mean substitution rate (adjusted by branch lengths) 
#> 0.00321337949455139 
#> 
#>  Unadjusted mean substitution rate 
#> 0.00321337949455139 
#> 
#>  Clock model  
#> strict 
#> 
#>  Coefficient of variation of rates 
#> 0

This produces a rooted tree with branches in calendar time. Note that if we invoked dater with a rooted input tree, it would not estimate the root position. In this way, you can also set the root location in other ways, such as by using an outgroup. It is also good practice to provide at least one initial guess of the substitution rate using the omega0 parameter, but if we omit that value as we have done here, treedater will attempt to guess good starting values. You can also specify an uncorrelated relaxed clock using clock='uncorrelated'.

Lets see how long it takes to run treedater:

rt0 <- Sys.time()
dtr <- dater( tre , sts, seqlen, clock = 'strict' )
#> Note: Initial guess of substitution rate not provided. Will attempt to guess starting conditions. Provide initial guesses of the rate using *omega0* parameter. 
#> Note: Minimum temporal branch length  (*minblen*) set to 0.0192816582316329. Increase *minblen* in the event of convergence failures. 
#> Tree is not rooted. Searching for best root position. Increase searchRoot to try harder.
#> Initial guesses of substitution rate: 0.00398896175787566,0.003999851152776,0.00432331974753195,0.00435051924886833,0.00465767773718823,0.00470118734496067 
#> Initial guesses of substitution rate: 0.00392634874833395,0.003999851152776,0.00425578014817497,0.00435051924886833,0.00458521154801598,0.00470118734496067 
#> Initial guesses of substitution rate: 0.00399105384086252,0.003999851152776,0.00432645068064694,0.00435051924886833,0.00466184752043137,0.00470118734496067 
#> Initial guesses of substitution rate: 0.00392233594728138,0.003999851152776,0.00425285389037382,0.00435051924886833,0.00458337183346626,0.00470118734496067 
#> Initial guesses of substitution rate: 0.00389455195669964,0.003999851152776,0.00422342495830629,0.00435051924886833,0.00455229795991293,0.00470118734496067 
#> 
#> NOTE: The p values for lineage clock rates show at least one outlying value after adjusting for multiple-testing.  This indicates a poor fit to the data for at least a portion of the phylogenetic tree. To visualize the distribution p-values, use *goodnessOfFitPlot*. 
#>      
#> 
#> The following steps may help to fix or alleviate common problems: 
#> * Check that the vector of sample times is correctly named and that the units are correct. 
#> * If passing a rooted tree, make sure that the root position was chosen correctly, or estimate the root position by passing an unrooted tree (e.g. pass ape::unroot(tree))
#> * The root position may be poorly estimated. Try increasing the _searchRoot_ parameter in order to test more lineages as potential root positions. 
#> * The model may be fitted by a relaxed or strict molecular clock. Try changing the _clock_ parameter 
#> * A poor fit may be due to a small number of lineages with unusual / outlying branch lengths which can occur due to sequencing error or poor alignment. Try the *outlierTips* command to identify and remove these lineages. 
#> * Check that there is adequate variance in sample times in order to estimate a molecular clock by doing a root-to-tip regression. Try the *rootToTipRegressionPlot* command. If the clock rate can not be reliably estimated, you can fix the value to a range using the _meanRateLimits_ option which would estimate a time tree given the previous estimate of clock rates.
rt1 <- Sys.time()
rt1 - rt0 
#> Time difference of 7.052854 secs

You can speed up treedater by providing a rooted tree, or by providing an educated guess of the substitution rate, or by using parallel computing with the ncpu option.

Note the returned value includes estimated substition rates and TMRCAs. The dtr object extends ape::phylo, so most of the methods that you can use in other R packages that use that format can also be used with a dater object. Lets plot the tree.

plot( dtr , no.mar=T, cex = .2 )

It looks like there are a couple of recent lineages that dont seem to fit well with the ladder-like topology. We can further examine this by doing a root-to-tip regression using the fitted tree and estimated node times which also shows a couple of outliers:

rootToTipRegressionPlot( dtr )

#> Root-to-tip mean rate: 0.00425285389037382 
#> Root-to-tip p value: 1.48110816750265e-89 
#> Root-to-tip R squared (variance explained): 0.900346071590208 
#> Returning fitted linear model.

It is always a good idea to visualize these distances to ensure that there is enough ‘clock signal’ in the data to reliably estimate rates and dates. We will examine these outliers in the next section.

Detecting and removing outliers / testing for relaxed clock

To find lineages that dont fit the molecular clock model very well, run

outliers <- outlierTips( dtr , alpha = 0.20) 
#>                                   taxon            q            p loglik
#> CY118498_1999-02-18 CY118498_1999-02-18 0.000000e+00 0.000000e+00     NA
#> CY113269_1980-12-01 CY113269_1980-12-01 0.000000e+00 0.000000e+00     NA
#> KP765772_2014-01-01 KP765772_2014-01-01 2.875832e-27 4.874292e-29     NA
#> KF805656_2013-02-21 KF805656_2013-02-21 1.910375e-20 4.317231e-22     NA
#> KF805696_2013-02-27 KF805696_2013-02-27 2.952830e-10 8.341328e-12     NA
#> KJ955531_2010-08-22 KJ955531_2010-08-22 2.342659e-06 7.941217e-08     NA
#> KM061038_2009-08-14 KM061038_2009-08-14 7.035534e-04 3.179903e-05     NA
#> KJ955515_2010-06-15 KJ955515_2010-06-15 7.035534e-04 2.830968e-05     NA
#> KC883315_2010-11-24 KC883315_2010-11-24 1.509124e-03 7.673513e-05     NA
#> KJ955534_2010-08-29 KJ955534_2010-08-29 2.048218e-03 1.157185e-04     NA
#> KJ955532_2010-08-29 KJ955532_2010-08-29 1.379923e-02 8.575790e-04     NA
#> JX978749_2012-03-05 JX978749_2012-03-05 5.510527e-02 4.047279e-03     NA
#> EU516033_2007-07-25 EU516033_2007-07-25 5.510527e-02 3.844844e-03     NA
#> JQ396181_2010-07-12 JQ396181_2010-07-12 1.499229e-01 1.355235e-02     NA
#> KJ955533_2010-08-29 KJ955533_2010-08-29 1.499229e-01 1.200609e-02     NA
#> KC865649_2012-01-19 KC865649_2012-01-19 1.499229e-01 1.353763e-02     NA
#>                           rates branch.length
#> CY118498_1999-02-18 0.003213379    0.01928166
#> CY113269_1980-12-01 0.003213379    0.01928166
#> KP765772_2014-01-01 0.003213379   33.10398111
#> KF805656_2013-02-21 0.003213379   14.02750084
#> KF805696_2013-02-27 0.003213379    0.52159511
#> KJ955531_2010-08-22 0.003213379    1.40196192
#> KM061038_2009-08-14 0.003213379    0.90392304
#> KJ955515_2010-06-15 0.003213379    7.42055007
#> KC883315_2010-11-24 0.003213379    1.87159441
#> KJ955534_2010-08-29 0.003213379    1.42114000
#> KJ955532_2010-08-29 0.003213379    1.42114000
#> JX978749_2012-03-05 0.003213379    1.87926882
#> EU516033_2007-07-25 0.003213379    1.14199367
#> JQ396181_2010-07-12 0.003213379    1.18959540
#> KJ955533_2010-08-29 0.003213379    1.42114000
#> KC865649_2012-01-19 0.003213379    3.06344764

This returns a table in ascending order showing the quality of the molecular clock model fit for each lineage. Now lineages could be selected for removal in various ways. Lets remove all tips that dont have a very high q-value :

tre2 <- ape::drop.tip( tre, rownames(outliers[outliers$q < 0.20,]) )

Now we can rerun dater with the reduced tree:

dtr2 <- dater(tre2, sts, seqlen, clock='uncorrelated', ncpu = 1)  # increase ncpu to use parallel computing
#> Note: Initial guess of substitution rate not provided. Will attempt to guess starting conditions. Provide initial guesses of the rate using *omega0* parameter. 
#> Note: Minimum temporal branch length  (*minblen*) set to 0.0202671658300009. Increase *minblen* in the event of convergence failures. 
#> Tree is not rooted. Searching for best root position. Increase searchRoot to try harder.
#> Initial guesses of substitution rate: 0.00428031122099497,0.00428031331175239,0.0044430098214926,0.00444301188242019,0.00460570842199023,0.00460571045308799 
#> Initial guesses of substitution rate: 0.00428031331175239,0.00428186446258191,0.00444301188242019,0.00444542080547683,0.00460571045308799,0.00460897714837174 
#> Initial guesses of substitution rate: 0.00424585532272424,0.00428031331175239,0.00441183237609376,0.00444301188242019,0.00457780942946329,0.00460571045308799 
#> Initial guesses of substitution rate: 0.00419969857588736,0.00428031331175239,0.00437920537947406,0.00444301188242019,0.00455871218306076,0.00460571045308799 
#> Initial guesses of substitution rate: 0.00417281469848489,0.00428031331175239,0.00435352237986441,0.00444301188242019,0.00453423006124393,0.00460571045308799
dtr2
#> 
#> Phylogenetic tree with 161 tips and 160 internal nodes.
#> 
#> Tip labels:
#>   HM628693_2010-03-03, KC883039_2011-04-04, KC882820_2011-01-04, GQ983548_2009-06-22, KF790184_2012-10-05, JX978746_2012-03-19, ...
#> 
#> Rooted; includes branch lengths.
#> 
#>  Time of common ancestor 
#> 1981.64612990573 
#> 
#>  Time to common ancestor (before most recent sample) 
#> 33.3977057107086 
#> 
#>  Weighted mean substitution rate (adjusted by branch lengths) 
#> 0.00410532691156529 
#> 
#>  Unadjusted mean substitution rate 
#> 0.00435527382447601 
#> 
#>  Clock model  
#> uncorrelated 
#> 
#>  Coefficient of variation of rates 
#> 0.0983473593638585

After removing the outliers, the coefficient of variation of rates is much lower, suggesting that a strict clock model may be appropriate for the reduced tree. We can test the suitability of the strict clock with this test:

rct <- relaxedClockTest( tre2, sts, seqlen, ncpu = 1 ) # increase ncpu to use parallel computing
#> Note: Initial guess of substitution rate not provided. Will attempt to guess starting conditions. Provide initial guesses of the rate using *omega0* parameter. 
#> Note: Minimum temporal branch length  (*minblen*) set to 0.0202671658300009. Increase *minblen* in the event of convergence failures. 
#> Tree is not rooted. Searching for best root position. Increase searchRoot to try harder.
#> Initial guesses of substitution rate: 0.00428031122099497,0.00428031331175239,0.0044430098214926,0.00444301188242019,0.00460570842199023,0.00460571045308799 
#> Initial guesses of substitution rate: 0.00428031331175239,0.00428186446258191,0.00444301188242019,0.00444542080547683,0.00460571045308799,0.00460897714837174 
#> Initial guesses of substitution rate: 0.00424585532272424,0.00428031331175239,0.00441183237609376,0.00444301188242019,0.00457780942946329,0.00460571045308799 
#> Initial guesses of substitution rate: 0.00419969857588736,0.00428031331175239,0.00437920537947406,0.00444301188242019,0.00455871218306076,0.00460571045308799 
#> Initial guesses of substitution rate: 0.00417281469848489,0.00428031331175239,0.00435352237986441,0.00444301188242019,0.00453423006124393,0.00460571045308799 
#> Running in quiet mode. To print progress, set quiet=FALSE.
#> NOTE: Running with overrideSearchRoot will speed up execution but may underestimate variance.
#> NOTE: Running with overrideTempConstraint will speed up execution but may underestimate variance. Bootstrap tree replicates may have negative branch lengths.
#> Note: Initial guess of substitution rate not provided. Will attempt to guess starting conditions. Provide initial guesses of the rate using *omega0* parameter. 
#> Note: Minimum temporal branch length  (*minblen*) set to 0.0202671658300009. Increase *minblen* in the event of convergence failures. 
#> Tree is not rooted. Searching for best root position. Increase searchRoot to try harder.
#> Initial guesses of substitution rate: 0.00428031122099497,0.00428031331175239,0.0044430098214926,0.00444301188242019,0.00460570842199023,0.00460571045308799 
#> Initial guesses of substitution rate: 0.00428031331175239,0.00428186446258191,0.00444301188242019,0.00444542080547683,0.00460571045308799,0.00460897714837174 
#> Initial guesses of substitution rate: 0.00424585532272424,0.00428031331175239,0.00441183237609376,0.00444301188242019,0.00457780942946329,0.00460571045308799 
#> Initial guesses of substitution rate: 0.00419969857588736,0.00428031331175239,0.00437920537947406,0.00444301188242019,0.00455871218306076,0.00460571045308799 
#> Initial guesses of substitution rate: 0.00417281469848489,0.00428031331175239,0.00435352237986441,0.00444301188242019,0.00453423006124393,0.00460571045308799 
#> Best clock model:  uncorrelated 
#> Null distribution of rate coefficient of variation: 1.23940698308398e-11 1.83951511114776e-06 
#> Returning best treedater fit

Note that the ncpu option enabled parallel computing to speed up this test.

This test indicates a relaxed clock. Nevertheless, lets re-fit the model to the reduced tree using a strict clock for comparison:

dtr3 <- dater( tre2, sts, seqlen, clock='strict' )
#> Note: Initial guess of substitution rate not provided. Will attempt to guess starting conditions. Provide initial guesses of the rate using *omega0* parameter. 
#> Note: Minimum temporal branch length  (*minblen*) set to 0.0202671658300009. Increase *minblen* in the event of convergence failures. 
#> Tree is not rooted. Searching for best root position. Increase searchRoot to try harder.
#> Initial guesses of substitution rate: 0.00428031122099497,0.00428031331175239,0.0044430098214926,0.00444301188242019,0.00460570842199023,0.00460571045308799 
#> Initial guesses of substitution rate: 0.00428031331175239,0.00428186446258191,0.00444301188242019,0.00444542080547683,0.00460571045308799,0.00460897714837174 
#> Initial guesses of substitution rate: 0.00424585532272424,0.00428031331175239,0.00441183237609376,0.00444301188242019,0.00457780942946329,0.00460571045308799 
#> Initial guesses of substitution rate: 0.00419969857588736,0.00428031331175239,0.00437920537947406,0.00444301188242019,0.00455871218306076,0.00460571045308799 
#> Initial guesses of substitution rate: 0.00417281469848489,0.00428031331175239,0.00435352237986441,0.00444301188242019,0.00453423006124393,0.00460571045308799
dtr3
#> 
#> Phylogenetic tree with 161 tips and 160 internal nodes.
#> 
#> Tip labels:
#>   HM628693_2010-03-03, KC883039_2011-04-04, KC882820_2011-01-04, GQ983548_2009-06-22, KF790184_2012-10-05, JX978746_2012-03-19, ...
#> 
#> Rooted; includes branch lengths.
#> 
#>  Time of common ancestor 
#> 1981.39140639216 
#> 
#>  Time to common ancestor (before most recent sample) 
#> 33.6524292242768 
#> 
#>  Weighted mean substitution rate (adjusted by branch lengths) 
#> 0.00429293702431387 
#> 
#>  Unadjusted mean substitution rate 
#> 0.00429293702431387 
#> 
#>  Clock model  
#> strict 
#> 
#>  Coefficient of variation of rates 
#> 0
plot( dtr3 , no.mar=T, cex = .2 ) 

The rate is higher than the initial estimate with the relaxed clock and the recently-sampled outlying lineages have been removed.

Parametric bootstrap

Estimating confidence intervals for rates and dates is straightforward using a parametric bootstrap:

rt2 <- Sys.time()
(pb <- parboot( dtr3, ncpu = 1) )# increase ncpu to use parallel computing
#> Running in quiet mode. To print progress, set quiet=FALSE.
#> NOTE: Running with overrideSearchRoot will speed up execution but may underestimate variance.
#> NOTE: Running with overrideTempConstraint will speed up execution but may underestimate variance. Bootstrap tree replicates may have negative branch lengths.
#>                            pseudo ML        2.5 %       97.5 %
#> Time of common ancestor 1.981391e+03 1.980801e+03 1.981949e+03
#> Mean substitution rate  4.292937e-03 3.829020e-03 4.813062e-03
#> 
#>  For more detailed output, $trees provides a list of each fit to each simulation
rt3 <- Sys.time()

How fast was it? Note that the ncpu option would enable parallel computing.

rt3 - rt2 
#> Time difference of 24.21419 secs

We can also plot the estimated number of lineages through time with confidence intervals:

plot( pb ) 

If the ggplot2 package is installed, we can use that instead:

if ( suppressPackageStartupMessages( require(ggplot2)) )
  (pb.pl <- plot( pb , ggplot=TRUE) )

Note repeated bottlenecks and seasonal peaks of LTT corresponding to when samples are taken during seasonal epidemics.

The package also includes methods for nonparametric bootstrapping if you have already computed a bootstrap distribution of phylogenies.

Missing sample times

Suppose we only know some of the sample times to the nearest month, a common occurance in viral phylogenetic studies. To simulate this, we will put uncertainty bounds on some sample times equal to a +/- 2-week window. We create the following data frame with columns lower and upper:

sts.df <- data.frame( lower = sts[1:50] - 15/365, upper = sts[1:50] + 15/365 )
head(sts.df )
#>                        lower    upper
#> HM628693_2010-03-03 2010.126 2010.208
#> KC883039_2011-04-04 2011.214 2011.296
#> KC882820_2011-01-04 2010.967 2011.049
#> GQ983548_2009-06-22 2009.430 2009.512
#> KF790184_2012-10-05 2012.718 2012.801
#> JX978746_2012-03-19 2012.172 2012.254

In this case, we constructed the data frame with bounds for the first 50 samples in the tree, but we could also manually construct a data frame for a few selected samples where times of sampling are uncertain, or for all of the samples.

Now re-run treedater with the uncertain sample times. The vector sts provided here gives an initial guess of the unknown sample times.

(dtr4 <- dater( tre2, sts, seqlen, clock='strict', estimateSampleTimes = sts.df ) )
#> Note: Initial guess of substitution rate not provided. Will attempt to guess starting conditions. Provide initial guesses of the rate using *omega0* parameter. 
#> Note: Minimum temporal branch length  (*minblen*) set to 0.0202671658300009. Increase *minblen* in the event of convergence failures. 
#> Tree is not rooted. Searching for best root position. Increase searchRoot to try harder.
#> Initial guesses of substitution rate: 0.00428031122099497,0.00428031331175239,0.0044430098214926,0.00444301188242019,0.00460570842199023,0.00460571045308799 
#> Initial guesses of substitution rate: 0.00428031331175239,0.00428186446258191,0.00444301188242019,0.00444542080547683,0.00460571045308799,0.00460897714837174 
#> Initial guesses of substitution rate: 0.00424585532272424,0.00428031331175239,0.00441183237609376,0.00444301188242019,0.00457780942946329,0.00460571045308799 
#> Initial guesses of substitution rate: 0.00419969857588736,0.00428031331175239,0.00437920537947406,0.00444301188242019,0.00455871218306076,0.00460571045308799 
#> Initial guesses of substitution rate: 0.00417281469848489,0.00428031331175239,0.00435352237986441,0.00444301188242019,0.00453423006124393,0.00460571045308799
#> 
#> Phylogenetic tree with 161 tips and 160 internal nodes.
#> 
#> Tip labels:
#>   HM628693_2010-03-03, KC883039_2011-04-04, KC882820_2011-01-04, GQ983548_2009-06-22, KF790184_2012-10-05, JX978746_2012-03-19, ...
#> 
#> Rooted; includes branch lengths.
#> 
#>  Time of common ancestor 
#> 1981.40756794937 
#> 
#>  Time to common ancestor (before most recent sample) 
#> 33.6526189829183 
#> 
#>  Weighted mean substitution rate (adjusted by branch lengths) 
#> 0.00432263525320041 
#> 
#>  Unadjusted mean substitution rate 
#> 0.00432263525320041 
#> 
#>  Clock model  
#> strict 
#> 
#>  Coefficient of variation of rates 
#> 0

Note that the estimated rates and dates didnt change very much due to uncertain sample dates in this case.