vcf_consensus

Generate Consensus DNA Sequences from VCF and FASTA

vcf_consensus is a CLI application written in Python that generates consensus DNA sequences for individual samples based on variations in a VCF file and a reference genome in FASTA format.

Supports both uncompressed and compressed files (.vcf.gz, .fasta.gz).

Installation

!Make sure that you have uv installed in your system, current version uses it as a package manager!

pip install uv

(in case you don't have one)

1. Clone the repository

git clone https://github.com/yourusername/vcf_consensus.git
cd vcf_consensus

2. Install in a virtual environment

python -m venv .venv
source .venv/bin/activate  # Linux/Mac
# .venv\Scripts\activate   # Windows

uv pip install -e .

Usage

Basic execution

python -m vcf_consensus.cli \
    --vcf input.vcf \
    --fasta reference.fasta \
    --length 500 \
    --count 10000 \
    --threshold 0.5 \
    --output output_consensus.fasta

Command-line options

--vcf	        Path to the VCF file (supports .vcf.gz).
--fasta	        Path to the FASTA reference genome (supports .fasta.gz).
--length	Length of each consensus sequence.
--count	        Number of consensus sequences to generate.
--threshold     Probability of using an alternative allele when present in the VCF (default: 0.0). 
                A value of 0.5 means that 50% of heterozygous positions will use an alternative allele.
--output	Output FASTA file for consensus sequences.
--seed	        Random seed for reproducibility (default: None).
--chrom-map     (Optinal) Manual chromosome name mapping if VCF and FASTA names differ (e.g., "1=chr1,2=chr2").
                Use this only if chromosomes in the VCF do not match those in the FASTA.
--mode          Method for selecting consensus start positions (default: random). 
                Options:
                  - "random": Randomly selects start positions.
                  - "sequential": Selects positions sequentially with 50% overlap.

Examples

1. Basic execution

python -m vcf_consensus.cli \
    --vcf example.vcf \
    --fasta example.fasta \
    --length 500 \
    --count 10000 \
    --threshold 0.5 \
    --output consensus.fasta

2. Using .vcf.gz and .fasta.gz

python -m vcf_consensus.cli \
    --vcf example.vcf.gz \
    --fasta example.fasta.gz \
    --length 500 \
    --count 10000 \
    --threshold 0.5 \
    --output consensus.fasta

3. Specifying chromosome name mapping

python -m vcf_consensus.cli \
    --vcf example.vcf \
    --fasta example.fasta \
    --length 500 \
    --count 10000 \
    --threshold 0.5 \
    --output consensus.fasta \
    --chrom-map "1=chr1,2=chr2"

Performance Considerations

FASTA files are fully loaded into memory. It may cause problems if FASTA is more than 3 GB.
Large VCF files (>10 million variants) may cause high memory usage. Same problem as above.
If you experience slow performance, reduce --count or increase --length to reduce the number of consensus sequences.
Problem with unmatching CHROM and > still remains

!Wait for the upcoming fixes)

License

This project is distributed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
src/vcf_consensus		src/vcf_consensus
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

vcf_consensus

Generate Consensus DNA Sequences from VCF and FASTA

Installation

!Make sure that you have uv installed in your system, current version uses it as a package manager!

1. Clone the repository

2. Install in a virtual environment

Usage

Basic execution

Command-line options

Examples

1. Basic execution

2. Using .vcf.gz and .fasta.gz

3. Specifying chromosome name mapping

Performance Considerations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

rljech13/vcf-consensus

Folders and files

Latest commit

History

Repository files navigation

vcf_consensus

Generate Consensus DNA Sequences from VCF and FASTA

Installation

!Make sure that you have uv installed in your system, current version uses it as a package manager!

1. Clone the repository

2. Install in a virtual environment

Usage

Basic execution

Command-line options

Examples

1. Basic execution

2. Using .vcf.gz and .fasta.gz

3. Specifying chromosome name mapping

Performance Considerations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages