8000 Does longphase support SNVs from illumina/short-reads · Issue #88 · twolinin/longphase · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Does longphase support SNVs from illumina/short-reads #88

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eesiribloom opened this issue Sep 23, 2024 · 5 comments
Open

Does longphase support SNVs from illumina/short-reads #88

eesiribloom opened this issue Sep 23, 2024 · 5 comments

Comments

@eesiribloom
Copy link

Given the higher accuracy of illumina short-read SNV/ indel calls compared to ONT long-reads, does longphase support phasing with short-read SNVs and long-read alignments/BAMs?

@ythuang0522
Copy link
Collaborator

Hi @eesiribloom The variant calling (e.g., VCF from Illumina) can differ from the phasing (e.g., BAM from ONT), as long as both are using the same reference genome.

@cteng585
Copy link

Hi @ythuang0522 , thank you for making this great tool!

I wanted to follow-up on this issue since I'm having some difficulty getting LongPhase to work with short-read variant calling and long-read alignment.

Both my short-read VCFs (Illumina sequencing/Mutect2 variant caller) and my long-read alignment files (ONT sequencing) are aligned to the same reference genome (hg38). However when I try to run the following:

longphase phase \                                                                          
-s short_read.snvs.vcf \       
-b long_read.bam \
-r ../reference.fa \
-t 8 \
-o test_phasing \
--ont

I get the following stdout:

LongPhase Ver 1.7.3

--- File Parameter --- 
SNP File      : short_read.snvs.vcf 
SV  File      : 
MOD File      : 
REF File      : ../reference.fa
Output Prefix : test_phasing
Generate Dot  : False
BAM File      : long_read.bam 

--- Phasing Parameter --- 
Seq Platform       : ONT
Phase Indel        : False
Distance Threshold : 300000
Connect Adjacent   : 20
Edge Threshold     : 0.7
Overlap Threshold  : 0.2
Mapping Quality    : 1
Variant Confidence : 0.75
ReadTag Confidence : 0.65

parsing VCF ... 0s
parsing SV VCF ... 0s
parsing Meth VCF ... 0s
reading reference ... 0s

parsing total:  0s
merge results ... 0s
writeResult SNP ... 1s

and it doesn't look like there's any change between the input and output VCFs.

Are there additional requirements that LongPhase looks for in the VCF to determine if the VCF is compatible with LongPhase?

@ythuang0522
Copy link
Collaborator

Hi @cteng585 , Mutect2 only outputs somatic SNPs which are usually much less than germline variants. As read-based phasing requires a long read spanning at least two variants, phasing over these small amount of somatic variants is not the right way (you can check the number of somatic variants first). One easy way is calling germline (e.g., via GATK or DeepVariant or Clair3) and somatic variants (Mutect2 or ClairS) separately and then merge the two VCFs for phasing. The LongPhase will have abundant variants for spanning and phasing.

@cteng585
Copy link
cteng585 commented Mar 3, 2025

Hi @ythuang0522 , thank you for the quick response! I was able to get LongPhase to work once I merged my VCF files from Mutect2 and Clair3 together.

I had one other question I was hoping you could take the time to answer. In one of the previous tickets #9 , it was suggested that one could theoretically combine the alignment files from ONT and Illumina short read sequencing. I tried to do this by merging an Illumina short-read CRAM and an ONT long-read CRAM, but LongPhase appears to hang in the process. Is this something that further work was done on? Do you know where I might start troubleshooting?

Thank you again!

@twolinin
Copy link
Owner
twolinin commented Mar 4, 2025

Hi @cteng585,

I tested the version you are using, and it should be able to run properly with CRAM files. If the CRAM index is not provided, there should be a corresponding message. If the reference FASTA file is missing the .fai index, the program will generate it automatically.

Currently, the testing methods I can think of are: first, try converting the CRAM file to BAM and test it; or alternatively, run the CRAM or BAM files separately from Illumina and ONT. I hope this helps you.

LongPhase Ver 1.7.3

--- File Parameter ---
SNP File      : test.vcf.gz
SV  File      :
MOD File      :
REF File      : test.fa
Output Prefix : result
Generate Dot  : False
BAM File      : test.10x.cram

--- Phasing Parameter ---
Seq Platform       : ONT
Phase Indel        : False
Distance Threshold : 300000
Connect Adjacent   : 20
Edge Threshold     : 0.7
Overlap Threshold  : 0.2
Mapping Quality    : 1
Variant Confidence : 0.75
ReadTag Confidence : 0.65

parsing VCF ... 15s
parsing SV VCF ... 0s
parsing Meth VCF ... 0s
reading reference ... 29s
[E::cram_index_load] Could not retrieve index file for 'test.10x.cram'
ERROR: Cannot open index for bam file
LongPhase Ver 1.7.3

--- File Parameter ---
SNP File      : test.vcf.gz
SV  File      :
MOD File      :
REF File      : test.fa
Output Prefix : result
Generate Dot  : False
BAM File      : test.10x.cram

--- Phasing Parameter ---
Seq Platform       : ONT
Phase Indel        : False
Distance Threshold : 300000
Connect Adjacent   : 20
Edge Threshold     : 0.7
Overlap Threshold  : 0.2
Mapping Quality    : 1
Variant Confidence : 0.75
ReadTag Confidence : 0.65

parsing VCF ... 15s
parsing SV VCF ... 0s
parsing Meth VCF ... 0s
reading reference ... 28s
(chrY,11s)(chr21,17s)(chr22,18s)(chrX,22s)(chr18,23s)(chr20,23s)(chr17,26s)(chr19,27s)(chr14,31s)(chr15,31s)(chr8,41s)(chr10,47s)(chr9,47s)(chr13,49s)(chr16,50s)(chr12,55s)(chr7,56s)(chr11,59s)(chr6,63s)(chr3,65s)(chr4,67s)(chr2,67s)(chr5,67s)(chr1,85s)
parsing total:  85s
merge results ... 3s
writeResult SNP ... 37s

total process: 168s

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0