Conda package for Mutation Attention deep learning tool for tumour type and subtype classification
-
Clone the muat Repository
git clone https://github.com/primasanjaya/muat.git
-
Navigate to the muat Directory.
cd muat
-
Create the Conda Environment.
To create the conda environment, run:conda env create -f muat-env.yml
-
Activate the Conda Environment.
After creating the environment, activate it with:conda activate muat-env
-
Install muat
Install muat via bioconda channelconda install bioconda::muat
-
Verify the Installation
To test if the installation was successful, run:muat -h
You will see:
Mutation Attention Tool
positional arguments:
{download,preprocessing,predict,train,benchmark}
Available commands
download Download the dataset.
preprocess Preprocess the dataset.
predict Predict samples.
train Train the MuAt model.
predict-ensemble Run the prediction using the best MuAt ensemble models
You can build docker container from source by running build_docker.sh
or you can access the prebuild one from https://biocontainers.pro/tools/muat
The example of SNV,MNV vcf file is in example_files/0a6be23a-d5a0-4e95-ada2-a61b2b5d9485.consensus.20160830.somatic.snv_mnv.vcf.gz
.
This file was written with hg19. To run prediction on this file, execute:
💡 Tips: use absolute paths (not relative paths) to ensure successful execution.
Run the prediction (exactly using this command)
(muat-env)$ muat predict wgs --hg19 genome_reference/hg19.fa --mutation-type 'snv+mnv' --input-filepath 'example_files/0a6be23a-d5a0-4e95-ada2-a61b2b5d9485.consensus.20160830.somatic.snv_mnv.vcf.gz' --result-dir results
To predict using VCF files written with hg38, run:
(muat-env)$ muat predict wgs --hg38 '/path/to/genome_reference/hg38.fa' --mutation-type 'snv+mnv' --input-filepath 'path/to/sample.vcf.gz' --result-dir 'path/to/result_dir/'
Predicting preprocessed data samples (read preprocessing steps here)
(muat-env)$ muat predict wgs --no-preprocessing --mutation-type 'snv+mnv' --input-filepath 'path/to/sample.token.gc.genic.exonic.cs.tsv.gz' --result-dir 'path/to/result_dir/'
Example cli to predict samples using the best MuAt ensemble models:
(muat-env)$ muat predict-ensemble muat-wgs --hg19 '/path/to/genome_reference/hg19.fa' --mutation-type 'snv+mnv' --input-filepath 'path/to/sample.vcf.gz' --result-dir 'path/to/result_dir/'
- Download PCAWG: Read README_download.md for details on downloading PCAWG Dataset.
- Preprocessing: Read README_preprocessing.md for details on preprocessing.
- General Training: Read README_MuAtTraining.md for general training instructions.
- Full Training of PCAWG Dataset: Read README_PCAWG.md for full training instructions on the PCAWG dataset.
- Training and Predicting Genomics England Dataset: Read README_GEL.md for complete training and prediction instructions on the Genomics England dataset.