8000 Haploid Haplotype Reconstruction · Issue #16 · kage-genotyper/kage · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Haploid Haplotype Reconstruction #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gsc74 opened this issue Sep 24, 2024 · 1 comment
Open

Haploid Haplotype Reconstruction #16

gsc74 opened this issue Sep 24, 2024 · 1 comment

Comments

@gsc74
Copy link
gsc74 commented Sep 24, 2024

@sandve, I am working on reconstructing a haploid haplotype using the imputed genotypes from KAGE. Currently, I am using the following commands:

kage index -r MHC-CHM13.ref.fa -v MHC_49-MC.vcf.gz -o index -k 31
kage genotype -i index -r APD_10x.fastq --glimpse MHC_49-MC.vcf.gz -t 32 --average-coverage 10 -k 31 -o temp/APD_PG_genotyping.vcf
bgzip temp/APD_PG_genotyping.vcf && tabix -p vcf temp/APD_PG_genotyping.vcf.gz
bcftools view -e 'GT="het"' temp/APD_PG_genotyping.vcf.gz | bgzip > temp/APD_PG_genotyping_no_homo.vcf.gz && tabix -p vcf temp/APD_PG_genotyping_no_homo.vcf.gz
bcftools consensus  -f MHC-CHM13.ref.fa -o APD_rec_KG.fasta temp/APD_PG_genotyping_no_homo.vcf.gz

In the above commands, I am using haploid reads to obtain genotypes, then filtering the heterozygous variants, and finally using the filtered genotypes to reconstruct the haploid haplotype from the imputed filtered genotypes.

My question is: Is this the correct way to use KAGE to reconstruct haplotypes? The input multiallelic VCF is converted to biallelic with bcftools norm -m -any command.

@ivargr
Copy link
Collaborator
ivargr commented Sep 25, 2024

Hi!

Interesting question, but unfortunately I don' have so much knowledge or experience with haplotype reconstruction. I have a feeling your approach makes sense and could work, but I guess the only way to know would be to maybe du some tests/benchmarks where you simulate reads from a single haplotype and check whether you actually get that haplotype back. One way could be to must simulate reads from the reference genome. Then you would expect your pipeline to give something very similar to the reference genome out in the end (and kage to call most genotypes as 0/0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0