TreeSort

TreeSort infers both recent and ancestral reassortment events along the branches of a phylogenetic tree of a fixed genomic segment. It uses a statistical hypothesis testing framework to identify branches where reassortment with other segments has occurred and reports these events.

Below is an example of 2 reassortment events inferred by TreeSort on a swine H1 dataset. The reference phylogeny is the hemagglutinin (HA) segment tree, and the branch annotations indicate reassortment relative to the HA's evolutionary history. The annotations list the acquired gene segments and how distant these segments were (# of nucleotide differences) from the original segments. For example, PB2(136) indicates that a new PB2 was acquired that was approximately 136 nucleotides different from the pre-reassortment PB2.

Citation

If you use TreeSort, please cite it as
Markin, A., Macken, C.A., Baker, A.L., and Anderson, T.K. Revealing reassortment in influenza A viruses with TreeSort. bioRxiv 2024.11.15.623781; doi: https://doi.org/10.1101/2024.11.15.623781.

N.B. TreeSort uses TreeTime in a subroutine to infer substitution rates for segments - please also cite Sagulenko et al. 2018 doi: 10.1093/ve/vex042.

Installation

For a default installation, run pip install treesort. Alternatively, you can download this repository and run pip install . from within the downloaded directory. TreeSort requires Python 3 to run and depends on SciPy, BioPython, DendroPy, and TreeTime (these dependencies will be installed automatically).

For a broader installation of the bioinformatics suite required to align sequences and build phylogenetic trees via the prepare_dataset.sh script that we provide, we recommend using a conda environment that can be set up as follows.

If you haven't already, configure bioconda.

conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict

Then create a new environment with required dependencies and install TreeSort inside that environment.

git clone https://github.com/flu-crew/TreeSort.git
cd TreeSort
conda create -n treesort-env --file conda-requirements.txt
conda activate treesort-env
pip install .
<Run TreeSort on your data>
conda deactivate

Tutorial

We use a swine H1 influenza A virus dataset for this tutorial. We include only HA and NA gene segments in this analysis for simplicity, but it can be expanded to all 8 segments. Please note that all sequences should have the dates of collection included in the deflines, and all metadata fields should be separated by "|". E.g., "A/swine/Iowa/A02934932/2017|1A.3.3.2|2017-05-12".

To start, we will install TreeSort using the conda method above

git clone https://github.com/flu-crew/TreeSort.git  # Download this repo
cd TreeSort
conda create -n treesort-env --file conda-requirements.txt  # Create a new conda env and install dependencies
conda activate treesort-env
pip install .  # Install TreeSort

Creating a descriptor file

The input to TreeSort is a descriptor file, which is a comma-separated csv file that describes where the alignments and trees for individual segments can be found. Here is an example descriptor file. For our case, the descriptor file could look as follows (the column headings should not be included):

segment name	path to the fasta alignment	path to the newick-formatted tree
*HA	HA-swine_H1_HANA.fasta.aln	HA-swine_H1_HANA.fasta.aln.rooted.tre
NA	NA-swine_H1_HANA.fasta.aln	NA-swine_H1_HANA.fasta.aln.rooted.tre

The star symbol (*) indicates the segment that will be used as the reference phylogeny and reassortment events will be inferred relative to this phylogeny (HA in this case). Note that the reference phylogeny should be rooted, whereas trees for other segments can be unrooted.

We will use prepare_dataset.sh bash script to automatically build alignments and trees for two segments in our swine dataset and compile a descriptor file. The script relies on the fact that every sequence has a segment name in the middle of the defline (e.g., |HA| or |4|).

./prepare_dataset.sh --fast --segments "HA,NA" tutorial/swH1-dataset/swine_H1_HANA.fasta HA tutorial/swH1-parsed

To make things faster, we use the --fast flag here so that all trees are built using FastTree. However, we do not recommend to use this flag for high-precision analyses. When this flag is not used, the script will build the reference phylogeny using IQ-Tree, which will be slower but will likely result in a better quality tree, and therefore more accurate reassortment inference.

The required arguments to the script are the path to the main fasta file, name of the regerence segment, and the path to the output directory. If --segments are not specified, the script assumes that 8 IAV segment names should be used (PB2, PB1, PA, HA, NP, NA, MP, NS).

Running the above command will save the descriptor file, all trees, and alignments to the tutorial/swH1-parsed directory. Note that if for your data you already have trees built, you can manually create the descriptor file without using the script.

Running TreeSort

First make sure to familiarize yourself with the options available in the tool by looking through the help message.

treesort -h

Having the descriptor file from above, TreeSort can be run as follows

cd tutorial/swH1-parsed/
treesort -i descriptor.csv -o swH1-HA.annotated.tre

To run the newest mincut algorithm for reassortment inference (see details here), please use

treesort -i descriptor.csv -o swH1-HA.annotated.tre -m mincut

TreeSort will first estimate molecular clock rates for each segment and then will infer reassortment and annotate the backbone tree. The output tree in nexus format (swH1-HA.annotated.tre) can be visualized in FigTree or icytree.org. You can view the inferred reassortment events by displaying the 'rea' annotations on tree edges, as shown in the Figure above.

In this example TreeSort identifies a total of 93 HA-NA reassortment events:

Inferred reassortment events with NA: 93.
Identified exact branches for 79/93 of them

Additionally, the method outputs the estimated reassortment rate per ancestral lineage per year. The rate translates to the probability of a single strain to undergo a reassortment event over the course of a year. In our case this probability of reassortment with NA is approximately 4%.

Below is a part of the TreeSort output, where we see two consecutive NA reassortment events. The NA clade classifications were added to the strain names so that it's easier to interpret these reassortment events. Here we had a 2002 NA -> 1998A NA switch, followed by a 1998A -> 2002B NA switch.

Uncertain reassortment placement (the '?' tag)

Note that this section only applies to the -m local inference method (the default method for TreeSort). The -m mincut method always infers certain reassortment placements.

Sometimes TreeSort does not have enough information to confidently place a reassortment event on a specific branch of the tree. TreeSort always narrows down the reassortment event to a particular ancestral node on a tree, but may not distinguish which of the child branches was affected by reassortment. In those cases, TreeSort will annotate both child branches with a ?<segment-name> tag. For example, ?PB2(26) below indicates that the reassortment with PB2 might have happened on either of the child branches.

Typically, this happens when the sampling density is low. Therefore, increasing the sampling density by including more strains in the analysis may resolve such instances.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
examples		examples
treesort		treesort
tutorial		tutorial
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda-requirements.txt		conda-requirements.txt
prepare_dataset.sh		prepare_dataset.sh
setup.py		setup.py
treesort.py		treesort.py
treetime-root.py		treetime-root.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TreeSort

Citation

Installation

Tutorial

Creating a descriptor file

Running TreeSort

Uncertain reassortment placement (the '?' tag)

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

flu-crew/TreeSort

Folders and files

Latest commit

History

Repository files navigation

TreeSort

Citation

Installation

Tutorial

Creating a descriptor file

Running TreeSort

Uncertain reassortment placement (the '?' tag)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages