treestructure
Identification of hidden population structure in time-scaled phylogenies
Erik Volz, Carsten Wiuf, Yonatan Grad, Simon Frost, Ann Dennis, Xavier Didelot, bioRxiv 704528; https://doi.org/10.1101/704528
This example shows trestruct
applied to a simulated
structured coalescent tree that includes samples from a large constant
size population and samples from three small ‘outbreaks’ which are
growing exponentially. These simulations were generated with the
phydynR
package.
Load the tree:
( tree <- ape::read.tree( system.file('sim.nwk', package='treestructure') ) )
#>
#> Phylogenetic tree with 275 tips and 274 internal nodes.
#>
#> Tip labels:
#> 1, 1, 1, 1, 1, 1, ...
#>
#> Rooted; includes branch lengths.
Note that the tip labels corresponds to the deme of each sample. ‘1’ is the constant size reservoir, and ‘0’ is the exponentially growing deme.
This will run the treestructure algorithm under default setting:
s <- trestruct( tree )
#> Tree has NA or duplicated tip labels. Adding a unique id.
#> Finding splits under nodes: 276
#> Finding splits under nodes: 276 526
#> Finding splits under nodes: 276 421
#> Finding splits under nodes: 276 473
You can print the results:
print(s)
#> Call:
#> trestruct(tre = tree)
#>
#> Significance level: 0.01
#> Number of clusters: 4
#> Number of partitions: 2
#> Number of taxa in each cluster:
#>
#> 1 2 3 4
#> 25 25 200 25
#> Number of taxa in each partition:
#>
#> 1 2
#> 75 200
#> ...
#> For complete data, use `as.data.frame(...)`
The default plotting behavior uses the ggtree
package if
available.
If not, or if desired, ape
plots are available
For subsequent analysis, you may want to turn the treestructure result into a dataframe:
structureData <- as.data.frame( s )
head( structureData )
#> taxon cluster partition
#> 1 1_1 3 2
#> 2 2_1 3 2
#> 3 3_1 3 2
#> 4 4_1 3 2
#> 5 5_1 3 2
#> 6 6_1 3 2
Each cluster and partition assignment is stored as a factor. You
could use split
to get a data frame for each partition.
Suppose we want a tree corresponding to partition 1:
with ( structureData,
ape::keep.tip(s$tree, taxon[ partition==1 ] )
) -> partition1
partition1
#>
#> Phylogenetic tree with 75 tips and 74 internal nodes.
#>
#> Tip labels:
#> 143_0, 144_0, 145_0, 146_0, 147_0, 148_0, ...
#>
#> Rooted; includes branch lengths.
plot(partition1)
Two parameters will have large influence on results:
level
is the significance level for subdividing a clade
into a new cluster. To detect more clusters, increase p
,
but note that this will also increase the false positive rate.minCladeSize
controls the smallest allowed cluster size
in terms of the number of tips. With a smaller value, smaller clusters
may be detected, but computation time will increase.Example:
trestruct( tree, level = .05, minCladeSize = 5 )
#> Tree has NA or duplicated tip labels. Adding a unique id.
#> Finding splits under nodes: 276
#> Finding splits under nodes: 276 280
#> Finding splits under nodes: 276 280 359 526
#> Finding splits under nodes: 276 279 526 532
#> Finding splits under nodes: 276 473
#> Finding splits under nodes: 276 277
#> Call:
#> trestruct(tre = tree, minCladeSize = 5, level = 0.05)
#>
#> Significance level: 0.05
#> Number of clusters: 8
#> Number of partitions: 4
#> Number of taxa in each cluster:
#>
#> 1 2 3 4 5 6 7 8
#> 136 6 25 18 7 25 29 29
#> Number of taxa in each partition:
#>
#> 1 2 3 4
#> 171 43 32 29
#> ...
#> For complete data, use `as.data.frame(...)`