EviAnn -- evidence-based eukaryotic genome annotation software

EviAnn (Evidence Annotation) is novel genome annotation software. It is purely evidence-based. EviAnn derives protein-coding gene and long non-coding RNA annotations from RNA-seq data and/or transcripts, and alignments of proteins from related species. EviAnn outputs annotations in GFF3 format. EviAnn does not require genome repeats to be soft-masked prior to running annotation. EviAnn is stable and fast. Annotation of a mouse (M.musculus) genome takes less than one hour on a single 24 core Intel Xeon Gold server (assuming input of aligned RNA-seq reads in BAM format and ~346Mb of protein sequences from several related species including human).

If you encounter any issues, feel free to ask in the issue section. Please also support the original authors. If you use EviAnn, kindly cite it: 【Efficient evidence-based genome annotation with EviAnn Aleksey V. Zimin, Daniela Puiu, Mihaela Pertea, James A. Yorke, Steven L. Salzberg bioRxiv 2025.05.07.652745; doi: https://doi.org/10.1101/2025.05.07.652745 】

Installation instructions

To install, first download the latest distribution tarball：zgtools-EviAnn_*.tar.gz (not one of the Source code files!) from the github release page：https://github.com/linyuiz/EviAnn_update/releases.

wget https://github.com/linyuiz/EviAnn_update/releases/download/v2.02-2/zgtools-EviAnn_2.0.2_v2.tar.gz
tar -xvzf zgtools-EviAnn_*.tar.gz
cd zgtools-EviAnn_*
export LD_LIBRARY_PATH=/usr/lib64:/lib64
./install.sh
mamba install agat seqkit TransDecoder minimap2 hisat2 #or conda install

The installation script will configure and make all necessary packages. The EviAnn executables will appear under zgtools-EviAnn_2.0.2/. You can run EviAnn from anywhere by executing zgtools-EviAnn_2.0.2/zgtools

Dependencies

For the dependencies of this software, please refer to: https://github.com/alekseyzimin/EviAnn_release?tab=readme-ov-file#dependencies. In addition to the seqkit, agat and hisat2 software, the pv command is also required and can be installed as follows:

①Debian/Ubuntu and derivatives：
sudo apt update
sudo apt install pv

②RHEL/CentOS/Fedora:
sudo yum install pv # CentOS 7
sudo dnf install pv # CentOS 8/Fedora

③Compile install:
http://www.ivarch.com/programs/pv.shtml

Prepare Data

You can prepare the data as I do: RNA data must end with .fq.gz/.fq or .fastq/.fastq.gz, and protein files must end with .pep.fa. No GFF file is needed, only the protein file is required.Note that formats like _R1.clean.fq.gz are not allowed; only {1,2}.fq.gz types are permitted.

For homologous proteins, it is recommended to download more sequences. Generally, selecting protein data from 5 closely related species is sufficient. If the BUSCO completeness score is not high enough, you can expand the range of closely related species and include more proteins, even 8000 up to one million proteins. Additionally, you can use the BUSCO database proteins as input files, such as copying the "embryophyta_odb10/ancestral" file as "embryophyta.pep.fa".

It now also supports inputting a GFF file for de novo prediction as evidence. The format example is as follows, and please note that an absolute path must be provided：

/project/99.EviAnn/00.used_data/Augustus.gff
/project/99.EviAnn/00.used_data/GeneMark.gff

Example:

Usage

You just need to soft link zgtools to your usual bin folder such as【~/bin】, or use an absolute path such as【/project/softawre/zgtools-EviAnn_2.0.2_v2/zgtools EviAnn】, Be sure to have【hisat2】and【seqkit】in your $PATH.

Usage:

        zgtools EviAnn genome.fa Pep_dir/ RNAseq_dir/ 60 3 Pair_NGS other.gff.list

        genome.fa             --Genome File
        Pep_dir/              --Homo Pep Dir
        RNAseq_dir/           --RNAseq Dir
        60                    --Threads
        3                     --Parallel Task Num
        Pair_NGS              --RNAseq Type(Pair_NGS/Single_NGS)
        other.gff.list        --Other Gff List

Example1:

        zgtools EviAnn 00.used_data/genome.fa 00.used_data/00.homo_data/ 00.used_data/01.RNA_data/ 60 3 Pair_NGS denovo.gff.list

Example2:

        zgtools EviAnn 00.used_data/genome.fa 00.used_data/00.homo_data/ 00.used_data/01.RNA_data/ 60 3 Single_NGS none

Note that the total Threads are threads multiplied by Parallel Task Num, for example: 60 x 3 = 180 threads.

Run log

This is the command【zgtools EviAnn genome.fa 00.homo_data/ 01.RNA_data/ 60 3 Pair_NGS other.gff.list】runtime log:

Main output

In the output directory, the main files include: EviAnn.gene.gff, EviAnn.pep.fa, EviAnn.cds.fa, and EviAnn.transcripts.fa, which are the gene GFF3 file with pseudogene annotations, protein sequences, CDS sequences, and transcript sequences, respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 201 Commits
eviann @ b8deba0		eviann @ b8deba0
m4		m4
ufasta @ 85d60d1		ufasta @ 85d60d1
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
LICENSE.txt		LICENSE.txt
Makefile		Makefile
Makefile.am		Makefile.am
PkgConfig.pm		PkgConfig.pm
README.md		README.md
configure.ac		configure.ac
install.sh.in		install.sh.in
scer_p.fasta		scer_p.fasta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

EviAnn -- evidence-based eukaryotic genome annotation software

Installation instructions

Dependencies

Prepare Data

Usage

Run log

Main output

About

Licenses found

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

Licenses found

linyuiz/EviAnn_update

Folders and files

Latest commit

History

Repository files navigation

EviAnn -- evidence-based eukaryotic genome annotation software

Installation instructions

Dependencies

Prepare Data

Usage

Run log

Main output

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages