Enhancements for paired-end short-read RNAseq alignment

Feature requests are the most annoying kind of github issue, but I am very interested in using minimap2 to align short read RNAseq sequences to genomes with inserted SNP and indel variation. Minimap2's fast indexing could be critical for doing this kind of analysis at a population level.

After experimenting with Minimap2's short read RNAseq settings, I have one or two feature suggestions, and a few questions about how to think about how minimap2 handles reads that align equally well to multiple regions of the genome.

Many RNAseq datasets are generated using strand-specific kits, which generate data from the antisense strand on read1 and the sense strand on read2. Right now, we have the option of aligning all reads to the fwd or rev strand, but it would be beneficial to have the option of aligning paired reads to the fwd/rev or rev/fwd strand.
For researchers who can confidently assume their data does not contain novel chromosomal recombinations, I do not know what to do with reads that best align to different chromosomes. While these reads can result from inaccuracies in genome construction, I am seeing 1.6% of my read pairs aligning to different chromosomes when using the CHM13v2.0 reference genome. It would be helpful to be able to more heavily penalize reads that align to different chromosomes (is that already possible?)
Given the prevalence of gene duplications and pseudogene transcripts, it is common for short read RNAseq sequences to align equally well to multiple transcripts or multiple genes. Downstream quantification tools often use these multimapping reads to estimate quantification uncertainty due to mapping error. By default, the -x splice:sr preset prevents any secondary alignments from being reported.
If I manually add this ability back and try to mimic STAR's multimapping tolerances (--secondary=yes -N 20 -p 1), I see secondary alignment counts that are ten times higher than those generated by STAR alignment of the same reads to the same reference genome (non-primary alignments are 9% of total read count with STAR vs 121% of total read count with minimap2 and the described settings). I am clearly configuring Minimap2 to be too eager to find secondary alignments!
What settings would you suggest? I would like to report all secondary alignments with scores equal to the primary alignment, up to a maximum of 10 or 20 equally plausible alignments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions