This repository contains a collection of programs for analysing sequence data.
find-motifs
finds (short) specified motifs in sequences from a fasta file.find-homopolymers
finds homopolymer tracts, as well as di- and tri-nucleotide repeats, in sequences from a FASTA file. See the documentationcoverotron
computes per-base or per-window sequence read coverage from a set of BAM or CRAM files.tabulate-alignments
reports the count of reads split by read base (or insertion/deletion) and mapping quality at each position in a given range or ranges.iorek-qc
quantifies the presence of 'true' and 'error' kmers in a set of sequence reads (from a FASTQ file), based on a database of true kmers in jellyfish2 format.tabulate-mismatches
walks all reads and reports all mismatches / insertions / deletions stratified by their type and flanking sequence. See the documentationassess-qualities
quantifies the error rate predicted by base qualities, against the observed error rate, in a set of read alignments.zoomsa
reads a multiple sequence alignment FASTA file, and writes an interactive HTML visualisation of it.
Iorek is released under the Boost software license. See the LICENSE.txt file for details.
Iorek makes use of several other libraries that are included in the source repository and released under their own respective licenses. These include:
- boost
- htslib
- SeqLib
- parallel-hashmap
- moodycamel concurrent queue
- jellyfish2
- Eigen
- sqlite3
- zstandard
- catch2
- wfa2
Please see the respective license files in subdirectories of 3rd_party/ for details.
iorek is built using the waf build tool, which is bundled with the code. A basic compilation cycle is:
$ ./waf configure
$ ./waf
Executables will appear in build/apps/
.
You can also optionally specify an installation prefix and ask waf
to install the result:
$ ./waf configure --prefix=[installation path]
$ ./waf install
Executables will be copied to [installation path]/bin/
.
Iorek was written by Gavin Band with contributions from Tom Roberts.