8000 GitHub - cidgoh/VIRUS-MVP: VirusMVP is an interactive heatmap-centric app that integrates viral genomic mutations, lineage information and curated functional impact to study the spread and evolution of viruses in Canada and globally.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

VirusMVP is an interactive heatmap-centric app that integrates viral genomic mutations, lineage information and curated functional impact to study the spread and evolution of viruses in Canada and globally.

License

Notifications You must be signed in to change notification settings

cidgoh/VIRUS-MVP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

VIRUS-MVP

VIRUS-MVP is a heatmap-centric visualization web application that encodes mutational information across viral populations e.g., SARS-CoV-2 & MPOX. You can find deployed versions of VIRUS-MVP (without user upload functionality) at https://virusmvp.org/.

app_interface

The data visualized by VIRUS-MVP is generated and annotated by two upstream components of this project:

  • nf-ncov-voc - genomics workflow for variant calling and annotation
  • pokay - mutation function annotation repository

data_to_app

Native vs Docker installation

VIRUS-MVP can be installed natively, or built through Docker.

A native installation will provide optimal performance on Windows and Mac machines, as there are varying performance costs associated with the Linux virtualization layer used to build Docker containers. VIRUS-MVP docker containers also use port mapping, which incurs additional performance costs.

However, Docker installations are highly portable. Installing VIRUS-MVP through Docker maintains a consistent and reproducible environment across systems, with minimal dependency errors. This consistency is especially useful when deploying the application, as variability between testing and production environments will be reduced.

Native installation steps

0. (If uploading your own data) Install Nextflow + Docker

File uploads trigger the nf-ncov-voc workflow written in Nextflow.

1. Clone the repository and its submodules

$ git clone git@github.com:cidgoh/VIRUS-MVP.git --recurse-submodules

2. Setup a venv environment

This does not provide the same performance overhead of a Docker container, as all venv will do is create a unique folder for the dependencies you will install natively on your operating system.

$ cd VIRUS-MVP

$ python3 -m venv myenv

$ source myenv/bin/activate

(myenv) $ pip install -r requirements.txt

3. Run the application

(myenv) $ python app.py

Go to http://0.0.0.0:8050/.

Note: Run the app from the root project directory to ensure all assets (e.g., JavaScript) load correctly.

Docker installation steps

It is a relatively simple setup. Just make sure you have Docker installed.

$ docker-compose build

$ docker-compose up

Warning: our docker setup bind mounts the host socket to the container. You should use a socket proxy prior to deployment.

One currently unresolved issue: If you upload a file while the application is deployed through Docker, and then later attempt to upload a file while the application is deployed natively, the application will likely run into permission issues related to the nf-ncov-voc cache. You can fix this by removing all cache files in the nf-ncov-voc/ directory:

$ rm -r results work .nextflow .nextflow.log* capsule framework plugins secrets tmp

You may have to use sudo.

Usage

A navbar at the top of the application has links to both this repository and the underlying genomics workflow. Clicking the "TUTORIAL" link will display this README in-app.

navbar

A legend at the top of application provides a detailed explanation of the heatmap view.

legend

Heatmap view

The left axis encodes viral lineages. Lineages belonging to VOC are in bold, and lineages belonging to VOI are in italics. Actively circulating lineages are denoted with ⚠️.

The right axis encodes the number of genomic sequences analyzed for each lineage.

The top axis encodes the nucleotide position of lineage mutations, with respect to the reference genome.

The bottom axis encodes the amino acid position of lineage mutations, in the following format:

Genic mutations: {GENE}.{AMINO ACID POSITION WITHIN THAT GENE}

Intergenic: {NEAREST DOWNSTREAM GENE}. {NUMBER OF NUCLEOTIDES UPSTREAM}

The heatmap cells encode the presence of mutations. The color of these cells encodes mutation frequency. Insertions, deletions, functional mutations, and lineages with a sample size of one are encoded as follows:

heatmap_cells

Hovering over cells displays detailed mutation information. Clicking cells opens a modal with detailed mutation function descriptions, and their citations.

scroll_hover_click

Histogram

The histogram bars encode the total number of mutations across all visualized lineages every 100 nucleotide positions. The little black bar at the bottom of the histogram view indicates which section of genome you have currently scrolled to in the heatmap.

histogram_hover_scroll

To navigate the heatmap more quickly, you can click on the genes in the histogram view. Clicking a gene in the histogram will automatically scroll the heatmap to the left-most mutation in that gene.

Toolbar

There are several tools in the top of the interface that can be used to edit the visualization.

select_lineages_btn

Clicking the select groups button opens a modal that allows you to rearrange and hide variants.

upload_btn

Clicking the upload button allows you to upload a FASTA or VCF file, which will then be processed by nf-ncov-voc to generate a new GVF file, and then rendered onto the heatmap. You can find examples of files users can upload in test_data/.

You must have Nextflow and Conda installed to upload files.

Your first upload will take a while. Subsequent uploads will be faster.

download_btn

Clicking the download dropdown menu allows you to download surveillance reports generated by nf-ncov-voc for the lineages visible in the heatmap. You can also download a mutation index JSON file, which provides parsable data on all the information rendered in the heatmap.

search_for_mutations_btn

Clicking the "search for mutations" button allows you to search for specific mutations by name, and automatically scroll the heatmap to that mutation.

jump_to_nt_pos_input

Typing a nucleotide position in the "jump to nucleotide position" textbox will automatically scroll the heatmap to that specific nucleotide position.

mutation_freq_slider

The mutation frequency slider allows you to filter heatmap cells by mutation frequency.

clade_defining_switch

The clade defining switch allows you to filter in and out heatmap cells corresponding to non-clade defining mutations.

Adapting the workflow to new a virus

To adapt this workflow for a new virus, users must provide a defined set of input files. These are processed through two modular components:

The figure below summarizes all required inputs (🟩), outputs generated by the workflow (🟦), and hybrid files that require both manual curation and automated generation (πŸŸͺ).

TODO commit asset Workflow Diagram

βœ… Required user-provided files (🟩):

TODO fill out example links

Reference genome files

  • GFF β€” Reference genome annotations (e.g., from NCBI RefSeq or Ensembl)
  • FASTA β€” Reference genome sequence

Ontology file

Viral genome sequences

  • FASTA β€” Assembled viral genomes
  • TSV β€” Associated metadata

Functional annotations (optional)

  • TSV β€” Functional annotation file in Pokay or custom format

πŸŸͺ Most Important File: genome_config.JSON

⚠️ Critical for customization
genome_config.JSON is the most important file for adapting the workflow to a new virus. It acts as the central resource file connecting reference features, ontologies, and annotations.

  • Partially auto-generated using provided inputs and helper scripts
  • Manual curation is required for accurate ontology term mapping, virus-specific details, and layout configuration
  • This file controls how genomic features and annotations are interpreted and visualized across all components of the framework

βš™οΈ Script-generated files (🟦):

  • GVF, Functional Annotation TSV, metadata TSVs, and summary PDFs
  • These are produced automatically by the workflow after running the Genomics Workflow and Functional Annotation pipelines

Future directions

We plan to build an API, which users can call to retrieve a text-based representation of the information rendered in VIRUS-MVP. We have begun this process by introducing the ability to download a mutation index JSON file, as mentioned above.

We also plan to modify the interface for a more intuitive display of segmented genomes, such as RSV or Influenza. You can track our progress on the segmented_demo branch. Basically, we will provide a dropdown that allows users to render discrete segments of the genome, one at a time.

Support

We encourage you to add any problems with the application as an issue in this repository, but you can also email us at contact@cidgoh.ca.

Authors and acknowledgement

@ivansg44: Visualization development

@anwarMZ: Genomic analysis

@Anoosha-Sehar: Functional annotation

@miseminger: Functional annotation and data standardization

William Hsiao, Gary Van Domselaar, and Paul Gordon

The results here are in whole or part based upon data hosted at the Canadian VirusSeq Data Portal: https://virusseq-dataportal.ca/. We wish to acknowledge the following organisations/laboratories for contributing data to the Portal: Canadian Public Health Laboratory Network (CPHLN), CanCOGGeN VirusSeq, Saskatchewan - Roy Romanow Provincial Laboratory(RRPL), Nova Scotia Health Authority, Alberta ProvLab North(APLN), Queen's University / Kingston Health Sciences Centre, National Microbiology Laboratory(NML), BCCDC Public Health Laboratory, Public Health Ontario(PHO), Newfoundland and Labrador - Eastern Health, Unity Health Toronto, Ontario Institute for Cancer Research(OICR), Manitoba Cadham Provincial Laboratory, and Manitoba Cadham Provincial Laboratory.

License

MIT

About

VirusMVP is an interactive heatmap-centric app that integrates viral genomic mutations, lineage information and curated functional impact to study the spread and evolution of viruses in Canada and globally.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 5

0