8000 GitHub - NyagaM/ont-methylDMR-kit: A workflow for analyzing differential methylation using bedmethyl files from ONT long-read sequencing
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

A workflow for analyzing differential methylation using bedmethyl files from ONT long-read sequencing

Notifications You must be signed in to change notification settings

NyagaM/ont-methylDMR-kit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Differential methylation analysis using bedmethyl files from long-read (ONT) data

ont-methylDMR-kit is a pipeline to call differentially methylated regions (DMRs) between haplotypes, two samples, or between two groups, coupled with annotation and visualisation, on long-read (ONT) sequencing bedmethyl files generated using modkit. This pipeline is inspired by my work on rare disorders, and the fact that long-read sequencing has the potential to comprehensively identify all modified bases, such as 5-Methylcytosine (5mC), 5-Hydroxymethylcytosine (5hmC), N6-methyladenine (6mA), and N4-methylcytosine (4mC) that have been identified for a growing number of rare disorders and imprinted disorders.

The pipeline is built using Nextflow, a bioinformatics workflow manager that enables the development of portable and reproducible workflows. There are two docker images available from DockerHub and DockerHub that contains all the tools/softwares required by the pipeline, making results highly reproducible.

DMR analysis:

DMR analysis is performed using the R-package DSS. It supports calling DMRs across 5-Methylcytosine (--5mC), 5-Hydroxymethylcytosine (--5hmC), N6-methyladenine (--6mA), and N4-methylcytosine (--4mC) modified bases. It also supports haplotype-specific DMRs using --phased_mC, --phased_mA, and --phased_hmC. Just provide raw bedmethyl files generated using modkit using either --input_file1 and --input_file2 for calling DMRs between haplotypes or two samples; or --input_group1 and --input_group2 for group analysis (bedmethyls in these two folders must have *.bed extension). Note that --phased_mC and --phased_hmC flags are not supported in group analysis; use either --5mC, --4mC or --6mA separately depending on the type of methylation being analysed. Methyl positions with less than 5 reads are filtered by default. The current default options for DSS; delta (threshold for defining DMR) at 10%, p-values threshold for calling DMR at 0.01, minimum length (in basepairs) required for DMR methylation change analysis at 100, minimum number of CpG sites required for DMR at 10, and merging two DMRs that are very close to each other at less than 100 basepairs. To change these parameters, edit main.nf in process dmr_calling or process group_dmr_calling_5mC or process group_dmr_calling_6mA See DSS for more information.

Annotation:

Significant DMRs are annotated to provide information on whether DMRs overlap with promoters, exons, and introns. A compressed file, annotations.zip (which needs to be unzipped tar -xvf annotations.zip), is provided with the pipeline that contains the annotation information, which is based on gencode v44.

Visualisation:

Annotated DMRs are plotted using modbamtools. It supports haplotype-specific DMR plotting (by providing haplotagged modified bam files using --phased_modBam ) or DMRs between two samples (by providing modified bam files using --input_modBam1 and --input_modBam2) but does not support plotting DMRs from group analysis. You can provide a gene list with --gene_list to only plot significant DMRs for the provided genes (if present). There is a plots only mode triggered by the flag --plots_only, which requires an annotated DMR bed file --annotated_dmrs (generated by process annotate_dmrs ) and modified bam files. annotate_dmrs

Installation and Usage:

$ git clone https://github.com/NyagaM/ont-methylDMR-kit.git
$ cd ont-methylDMR-kit
$ tar -xvf annotations.zip

To view usage and run options:

$ nextflow run main.nf --help
Usage: nextflow run main.nf [options]

Options:
  --input_file1     First input bedmethyl file from sample_1 (for DMRs between two samples) or haplotype_1 bedmethyl (for DMRs between haplotypes) (required)
  --input_file2     Second input bedmethyl file from sample_2 (for DMRs between two samples) or haplotype_2 bedmethyl (for DMRs between haplotypes) (required)
  --output_dir      Output directory (required)
  --input_modBam1   First input modified BAM used to generate bedmethyl for sample_1 (optional)
  --input_modBam2   Second input modified BAM used to generate bedmethyl for sample_2 (optional)
  --input_group1    Directory containing bedmethyl files (with *.bed extension) for group 1 (optional)
  --input_group2    Directory containing bedmethyl files (with *.bed extension) for group 2 (optional)
  --gene_list       A list of genes as tsv file to generate plots on (optional)
  --5mC             Use this flag to trigger 5mC DMR calling (optional)
  --5hmC            Use this flag to trigger 5hmC DMR calling (optional)
  --6mA             Use this flag to trigger 6mA DMR calling (optional)
  --4mC             Use this flag to trigger 4mC DMR calling (optional)
  --phased_mC       Use this flag to trigger haplotagged 5mC/4mC DMR calling if input files are haplotagged bedmethyls (optional)
  --phased_mA       Use this flag to trigger haplotagged 6mA DMR calling if input files are haplotagged bedmethyls (optional)
  --phased_hmC      Use this flag to trigger haplotagged 5hmC DMR calling if input files are haplotagged bedmethyls (optional)
  --phased_modBam   Haplotagged modified BAM (required for plotting DMRs if --phased_mC/--phased_mA is used)
  --plots_only      Only run the plotting processes, requires --annotated_dmrs and BAM files
  --annotated_dmrs  Path to annotated DMR bed file (required if --plots_only is used)
  --help            Print this help message

To run the full DMR analysis workflow between two samples:

nextflow run ont-methylDMR-kit/main.nf -profile standard \
  --input_file1 /path/to/bedmethyl file for sample 1 \
  --input_file2 /path/to/bedmethyl file for sample 2 \
  --5mC \ # or --6mA or --4mC
  --input_modbam1 /path/to/modBam for sample 1 \
  --input_modbam2 /path/to/modBam for sample 2 \
  --output_dir /path/to/write output \
  --gene_list /path/to/gene_list.txt  # if not provided, all regions will be plotted

To run the full DMR analysis workflow between between two haplotypes:

nextflow run ont-methylDMR-kit/main.nf -profile standard \
  --input_file1 /path/to/bedmethyl file for haplotype 1 \
  --input_file2 /path/to/bedmethyl file for haplotype 2 \
  --phased_mC \ # or --phased_mA or --phased_hmC
  --phased_modBam /path/to/phased modBam for the sample \
  --output_dir /path/to/write output \
  --gene_list /path/to/gene_list.txt  # if not provided, all regions will be plotted

To run plots-only mode for DMRs identified between two samples:

nextflow run ont-methylDMR-kit/main.nf -profile standard \
  --plots-only \
  --annotated_dmrs /path/to/dmrs_table_annotated.bed \
  --input_modbam1 /path/to/modBam for sample 1 \
  --input_modbam2 /path/to/modBam for sample 2 \
  --output_dir /path/to/write output \
  --gene_list /path/to/gene_list.txt 

To run plots-only mode for DMRs identified between haplotypes:

nextflow run ont-methylDMR-kit/main.nf -profile standard \
  --plots-only \
  --phased_mC \ # or --phased_mA or --phased_hmC
  --annotated_dmrs /path/to/dmrs_table_annotated.bed \
  --phased_modBam /path/to/phased modBam for the sample \
  --output_dir /path/to/write output \
  --gene_list /path/to/gene_list.txt 

To run DMR analysis between two groups of bedmethyl files:

nextflow run ont-methylDMR-kit/main.nf -profile standard \
  --input_group1 /path/to/bedmethyl files (must have .bed extension) for group 1 \
  --input_group1 /path/to/bedmethyl files (must have .bed extension) for group 2 \
  --5mC \ # or --6mA or --4mC 
  --output_dir /path/to/write output

Publications

  1. Nyaga, D.M., Tsai, P., Gebbie, C. et al. Benchmarking nanopore sequencing and rapid genomics feasibility: validation at a quaternary hospital in New Zealand. npj Genom. Med. 9, 57 (2024). https://doi.org/10.1038/s41525-024-00445-5

About

A workflow for analyzing differential methylation using bedmethyl files from ONT long-read sequencing

Resources

Stars

Watchers

Forks

Packages

No packages published
0