EviAnn (Evidence Annotation) is novel genome annotation software. It is purely evidence-based. EviAnn derives protein-coding gene and long non-coding RNA annotations from RNA-seq data and/or transcripts, and alignments of proteins from related species. EviAnn outputs annotations in GFF3 format. EviAnn does not require genome repeats to be soft-masked prior to running annotation. EviAnn is stable and fast. Annotation of a mouse (M.musculus) genome takes less than one hour on a single 24 core Intel Xeon Gold server (assuming input of aligned RNA-seq reads in BAM format and ~346Mb of protein sequences from several related species including human).
If you encounter any issues, feel free to ask in the issue section. Please also support the original authors. If you use EviAnn, kindly cite it: 【Efficient evidence-based genome annotation with EviAnn Aleksey V. Zimin, Daniela Puiu, Mihaela Pertea, James A. Yorke, Steven L. Salzberg bioRxiv 2025.05.07.652745; doi: https://doi.org/10.1101/2025.05.07.652745 】
To install, first download the latest distribution tarball:zgtools-EviAnn_*.tar.gz (not one of the Source code files!) from the github release page:https://github.com/linyuiz/EviAnn_update/releases.
wget https://github.com/linyuiz/EviAnn_update/releases/download/v2.02-2/zgtools-EviAnn_2.0.2_v2.tar.gz
tar -xvzf zgtools-EviAnn_*.tar.gz
cd zgtools-EviAnn_*
export LD_LIBRARY_PATH=/usr/lib64:/lib64
./install.sh
mamba install agat seqkit TransDecoder minimap2 hisat2 #or conda install
The installation script will configure and make all necessary packages. The EviAnn executables will appear under zgtools-EviAnn_2.0.2/. You can run EviAnn from anywhere by executing zgtools-EviAnn_2.0.2/zgtools
For the dependencies of this software, please refer to: https://github.com/alekseyzimin/EviAnn_release?tab=readme-ov-file#dependencies. In addition to the seqkit, agat and hisat2 software, the pv command is also required and can be installed as follows:
①Debian/Ubuntu and derivatives:
sudo apt update
sudo apt install pv
②RHEL/CentOS/Fedora:
sudo yum install pv # CentOS 7
sudo dnf install pv # CentOS 8/Fedora
③Compile install:
http://www.ivarch.com/programs/pv.shtml
You can prepare the data as I do: RNA data must end with .fq.gz/.fq or .fastq/.fastq.gz, and protein files must end with .pep.fa. No GFF file is needed, only the protein file is required.Note that formats like _R1.clean.fq.gz are not allowed; only {1,2}.fq.gz types are permitted.
For homologous proteins, it is recommended to download more sequences. Generally, selecting protein data from 5 closely related species is sufficient. If the BUSCO completeness score is not high enough, you can expand the range of closely related species and include more proteins, even 8000 up to one million proteins. Additionally, you can use the BUSCO database proteins as input files, such as copying the "embryophyta_odb10/ancestral" file as "embryophyta.pep.fa".
It now also supports inputting a GFF file for de novo prediction as evidence. The format example is as follows, and please note that an absolute path must be provided:
/project/99.EviAnn/00.used_data/Augustus.gff
/project/99.EviAnn/00.used_data/GeneMark.gff
Example:
You just need to soft link zgtools to your usual bin folder such as【~/bin】, or use an absolute path such as【/project/softawre/zgtools-EviAnn_2.0.2_v2/zgtools EviAnn】, Be sure to have【hisat2】and【seqkit】in your $PATH.
Usage:
zgtools EviAnn genome.fa Pep_dir/ RNAseq_dir/ 60 3 Pair_NGS other.gff.list
genome.fa --Genome File
Pep_dir/ --Homo Pep Dir
RNAseq_dir/ --RNAseq Dir
60 --Threads
3 --Parallel Task Num
Pair_NGS --RNAseq Type(Pair_NGS/Single_NGS)
other.gff.list --Other Gff List
Example1:
zgtools EviAnn 00.used_data/genome.fa 00.used_data/00.homo_data/ 00.used_data/01.RNA_data/ 60 3 Pair_NGS denovo.gff.list
Example2:
zgtools EviAnn 00.used_data/genome.fa 00.used_data/00.homo_data/ 00.used_data/01.RNA_data/ 60 3 Single_NGS none
Note that the total Threads are threads multiplied by Parallel Task Num, for example: 60 x 3 = 180 threads.
This is the command【zgtools EviAnn genome.fa 00.homo_data/ 01.RNA_data/ 60 3 Pair_NGS other.gff.list】runtime log:
In the output directory, the main files include: EviAnn.gene.gff, EviAnn.pep.fa, EviAnn.cds.fa, and EviAnn.transcripts.fa, which are the gene GFF3 file with pseudogene annotations, protein sequences, CDS sequences, and transcript sequences, respectively.