Description
Hello,
I tried to run the new version of vsearch with sintax on a computing cluster, but the processing was extremely slow despite the large amount of computing resources requested (4775 MB per core) and threading (40 cores). The input ASV fasta file is 4.6MB for 10,710 ASVs, and the reference database is the complete Eukaryote COI BOLD database (1.7GB, 2216285 sequences).
vsearch ran for 13 days, but only outputed a 72.7KB one column file, which seem to indicate that only 6236 ASVs were processed. Below is the head and tail of the output file:
ASV_7
ASV_20
ASV_16
ASV_17
ASV_10
ASV_19
ASV_12
ASV_34
ASV_35
ASV_9
...
ASV_6228
ASV_6229
ASV_6230
ASV_6231
ASV_6232
ASV_6233
ASV_6234
ASV_6235
ASV_6236
Here is the script for the .sh file used to run vsearch:
`#!/bin/bash
#SBATCH --mem-per-cpu=4775M
#SBATCH --cpus-per-task=40
#SBATCH --time=48:00:00
#SBATCH --account=def-mcristes
#SBATCH --mail-user=mathilde.salamon@mcgill.ca
#SBATCH --mail-type=ALL
module load StdEnv/2020 vsearch/2.28.1
Run VSEARCH
vsearch --sintax ASVs_Malaise_traps_DADA2.fasta
--sintax_random
--db SINTAX_COI_v5.1.0ref.fasta
--tabbedout rdp_sintax_unoise3_COI.txt
--sintax_cutoff 0.8
--strand both
--threads 40
--log sintax_COI_MalaiseTraps_log.txt`
I am unsure why the program was so slow, could this be due to the very large reference database ?
Thank you for your help,
Best wishes,
Mathilde Salamon