ont-methylDMR-kit
is a pipeline to call differentially methylated regions (DMRs) between haplotypes, two samples, or between two groups, coupled with annotation and visualisation, on long-read (ONT) sequencing bedmethyl files generated using modkit. This pipeline is inspired by my work on rare disorders, and the fact that long-read sequencing has the potential to comprehensively identify all modified bases, such as 5-Methylcytosine (5mC), 5-Hydroxymethylcytosine (5hmC), N6-methyladenine (6mA), and N4-methylcytosine (4mC) that have been identified for a growing number of rare disorders and imprinted disorders.
The pipeline is built using Nextflow, a bioinformatics workflow manager that enables the development of portable and reproducible workflows. There are two docker images available from DockerHub and DockerHub that contains all the tools/softwares required by the pipeline, making results highly reproducible.
DMR analysis is performed using the R-package DSS. It supports calling DMRs across 5-Methylcytosine (--5mC
), 5-Hydroxymethylcytosine (--5hmC
), N6-methyladenine (--6mA
), and N4-methylcytosine (--4mC
) modified bases. It also supports haplotype-specific DMRs using --phased_mC
, --phased_mA
, and --phased_hmC
. Just provide raw bedmethyl files generated using modkit using either --input_file1
and --input_file2
for calling DMRs between haplotypes or two samples; or --input_group1
and --input_group2
for group analysis (bedmethyls in these two folders must have *.bed
extension). Note that --phased_mC
and
--phased_hmC
flags are not supported in group analysis; use either --5mC
, --4mC
or --6mA
separately depending on the type of methylation being analysed. Methyl positions with less than 5 reads are filtered by default. The current default options for DSS; delta (threshold for defining DMR) at 10%, p-values threshold for calling DMR at 0.01, minimum length (in basepairs) required for DMR methylation change analysis at 100, minimum number of CpG sites required for DMR at 10, and merging two DMRs that are very close to each other at less than 100 basepairs. To change these parameters, edit main.nf
in process dmr_calling or process group_dmr_calling_5mC or process group_dmr_calling_6mA
See DSS for more information.
Significant DMRs are annotated to provide information on whether DMRs overlap with promoters, exons, and introns. A compressed file, annotations.zip (which needs to be unzipped tar -xvf annotations.zip
), is provided with the pipeline that contains the annotation information, which is based on gencode v44.
Annotated DMRs are plotted using modbamtools. It supports haplotype-specific DMR plotting (by providing haplotagged modified bam files using --phased_modBam
) or DMRs between two samples (by providing modified bam files using --input_modBam1
and --input_modBam2
) but does not support plotting DMRs from group analysis. You can provide a gene list with --gene_list
to only plot significant DMRs for the provided genes (if present).
There is a plots only mode triggered by the flag --plots_only
, which requires an annotated DMR bed file --annotated_dmrs
(generated by process annotate_dmrs
) and modified bam files.
annotate_dmrs
$ git clone https://github.com/NyagaM/ont-methylDMR-kit.git
$ cd ont-methylDMR-kit
$ tar -xvf annotations.zip
To view usage and run options:
$ nextflow run main.nf --help
Usage: nextflow run main.nf [options]
Options:
--input_file1 First input bedmethyl file from sample_1 (for DMRs between two samples) or haplotype_1 bedmethyl (for DMRs between haplotypes) (required)
--input_file2 Second input bedmethyl file from sample_2 (for DMRs between two samples) or haplotype_2 bedmethyl (for DMRs between haplotypes) (required)
--output_dir Output directory (required)
--input_modBam1 First input modified BAM used to generate bedmethyl for sample_1 (optional)
--input_modBam2 Second input modified BAM used to generate bedmethyl for sample_2 (optional)
--input_group1 Directory containing bedmethyl files (with *.bed extension) for group 1 (optional)
--input_group2 Directory containing bedmethyl files (with *.bed extension) for group 2 (optional)
--gene_list A list of genes as tsv file to generate plots on (optional)
--5mC Use this flag to trigger 5mC DMR calling (optional)
--5hmC Use this flag to trigger 5hmC DMR calling (optional)
--6mA Use this flag to trigger 6mA DMR calling (optional)
--4mC Use this flag to trigger 4mC DMR calling (optional)
--phased_mC Use this flag to trigger haplotagged 5mC/4mC DMR calling if input files are haplotagged bedmethyls (optional)
--phased_mA Use this flag to trigger haplotagged 6mA DMR calling if input files are haplotagged bedmethyls (optional)
--phased_hmC Use this flag to trigger haplotagged 5hmC DMR calling if input files are haplotagged bedmethyls (optional)
--phased_modBam Haplotagged modified BAM (required for plotting DMRs if --phased_mC/--phased_mA is used)
--plots_only Only run the plotting processes, requires --annotated_dmrs and BAM files
--annotated_dmrs Path to annotated DMR bed file (required if --plots_only is used)
--help Print this help message
To run the full DMR analysis workflow between two samples:
nextflow run ont-methylDMR-kit/main.nf -profile standard \
--input_file1 /path/to/bedmethyl file for sample 1 \
--input_file2 /path/to/bedmethyl file for sample 2 \
--5mC \ # or --6mA or --4mC
--input_modbam1 /path/to/modBam for sample 1 \
--input_modbam2 /path/to/modBam for sample 2 \
--output_dir /path/to/write output \
--gene_list /path/to/gene_list.txt # if not provided, all regions will be plotted
To run the full DMR analysis workflow between between two haplotypes:
nextflow run ont-methylDMR-kit/main.nf -profile standard \
--input_file1 /path/to/bedmethyl file for haplotype 1 \
--input_file2 /path/to/bedmethyl file for haplotype 2 \
--phased_mC \ # or --phased_mA or --phased_hmC
--phased_modBam /path/to/phased modBam for the sample \
--output_dir /path/to/write output \
--gene_list /path/to/gene_list.txt # if not provided, all regions will be plotted
To run plots-only mode for DMRs identified between two samples:
nextflow run ont-methylDMR-kit/main.nf -profile standard \
--plots-only \
--annotated_dmrs /path/to/dmrs_table_annotated.bed \
--input_modbam1 /path/to/modBam for sample 1 \
--input_modbam2 /path/to/modBam for sample 2 \
--output_dir /path/to/write output \
--gene_list /path/to/gene_list.txt
To run plots-only mode for DMRs identified between haplotypes:
nextflow run ont-methylDMR-kit/main.nf -profile standard \
--plots-only \
--phased_mC \ # or --phased_mA or --phased_hmC
--annotated_dmrs /path/to/dmrs_table_annotated.bed \
--phased_modBam /path/to/phased modBam for the sample \
--output_dir /path/to/write output \
--gene_list /path/to/gene_list.txt
To run DMR analysis between two groups of bedmethyl files:
nextflow run ont-methylDMR-kit/main.nf -profile standard \
--input_group1 /path/to/bedmethyl files (must have .bed extension) for group 1 \
--input_group1 /path/to/bedmethyl files (must have .bed extension) for group 2 \
--5mC \ # or --6mA or --4mC
--output_dir /path/to/write output
- Nyaga, D.M., Tsai, P., Gebbie, C. et al. Benchmarking nanopore sequencing and rapid genomics feasibility: validation at a quaternary hospital in New Zealand. npj Genom. Med. 9, 57 (2024). https://doi.org/10.1038/s41525-024-00445-5