The processes for splitting files are running very slowly with large numbers of input samples

Hello,

We are attempting to run this pipeline with a rather complex scenario: 850 contrasts and 3400 samples. The processes SPLIT_FILES_TPM, SPLIT_FILES_IOE and SPLIT_FILES_IOI appear to be running very slowly.

For example, looking at SPLIT_FILES_TPM, this has been running for 12 hours and only produced 61 TPM files so far. At this rate, it should take around 4 weeks to finish this process, before even attempting to run Suppa2...

We have successfully run these samples through nf-core/rnaseq (which took less than a week on 512 cores and 2 TB of RAM), and nf-core/differentialabundance (which only took 2 hours to run DESeq2). We now intend to run Suppa2 via this pipeline.

Any advice for speeding things up would be much appreciated. This is the command that we are running:

nextflow run nf-core/rnasplice -r 1.0.2 -c custom.config -params-file params.yml --igenomes_ignore --genome null \
--input /path/to/sample-sheet-rnasplice.csv \
--outdir run-rnasplice -profile docker --source salmon_results \
--fasta /path/to/genome/gencode_v45_spike-ins.fasta \
--gtf /path/to/genome/gencode_v45_spike-ins.gtf \
--star_index /path/to/genome/index/star \
--salmon_index /path/to/genome/index/salmon \
--dexseq_exon false --dexseq_dtu false \
--suppa --suppa_per_local_event \
--contrasts /path/to/contrasts-rnasplice.csv \
--sashimi_plot false

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions