8000 kage index throws an error · Issue #19 · kage-genotyper/kage · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

kage index throws an error #19

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eblerjana opened this issue Mar 11, 2025 · 3 comments
Open

kage index throws an error #19

eblerjana opened this issue Mar 11, 2025 · 3 comments

Comments

@eblerjana
Copy link
eblerjana commented Mar 11, 2025

Hi!

I'm running kage index on a biallelic VCF derived from a Minigraph-Cactus graph with 195 samples. This is the command I'm using:

kage index -r reference.fa -v variants.vcf -o results.npz -k 31 -t 24

However, I'm getting the following error:

INFO:root:Memory usage (Done variant stream): 270.7164 GB
Traceback (most recent call last):
  File "/usr/local/bin/kage", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/kage/command_line_interface.py", line 52, in main
    run_argument_parser(sys.argv[1:])
  File "/usr/local/lib/python3.10/site-packages/kage/command_line_interface.py", line 559, in run_argument_parser
    args.func(args)
  File "/usr/local/lib/python3.10/site-packages/kage/indexing/main.py", line 305, in make_index_cli
    r = make_index(args.reference, args.vcf, args.out_base_name,
  File "/usr/local/lib/python3.10/site-packages/kage/indexing/main.py", line 150, in make_index
    haplotype_matrix_original_vcf = SparseHaplotypeMatrix.from_vcf(variant_stream)
  File "/usr/local/lib/python3.10/site-packages/kage/indexing/sparse_haplotype_matrix.py", line 225, in from_vcf
    return cls.from_vcf2(vcf_file_name, dtype=dtype)
  File "/usr/local/lib/python3.10/site-packages/kage/indexing/sparse_haplotype_matrix.py", line 214, in from_vcf2
    matrix.extend(submatrix)
  File "/usr/local/lib/python3.10/site-packages/kage/indexing/sparse_haplotype_matrix.py", line 30, in extend
    self.data = scipy.sparse.vstack([self.data, other.data])
  File "/usr/local/lib/python3.10/site-packages/scipy/sparse/_construct.py", line 804, in vstack
    return _block([[b] for b in blocks], format, dtype, return_spmatrix=True)
  File "/usr/local/lib/python3.10/site-packages/scipy/sparse/_construct.py", line 944, in _block
    blocks = [[_stack_along_minor_axis(blocks[:, b], 0) for b in range(N)]]
  File "/usr/local/lib/python3.10/site-packages/scipy/sparse/_construct.py", line 944, in <listcomp>
    blocks = [[_stack_along_minor_axis(blocks[:, b], 0) for b in range(N)]]
  File "/usr/local/lib/python3.10/site-packages/scipy/sparse/_construct.py", line 672, in _stack_along_minor_axis
    raise ValueError(f'Mismatching dimensions along axis {other_axis}: '
ValueError: Mismatching dimensions along axis 1: {1, 390}

This is the full log:
index.log

Do you have any idea what could be causing this error?

Thanks!

Best,
Jana

@ivargr
Copy link
Collaborator
ivargr commented Mar 14, 2025

Hi!

I have a feeling that something specific in the vcf may be causing this error, probably something in the haplotype data. Would you by any chance be able to share the vcf with me? If not, I could try to add some more debugging output when this error happens that maybe could pinpoint the problem.

@eblerjana
Copy link
Author

Hi,

thanks, I did a few more tests and I suspect it's indeed a problem with my VCF, I think it comes from broken AN/AC fields. I've fixed my VCF and started another run of indexing. I'll let you know if the error is gone once the run finished!

Best,
Jana

@eblerjana
Copy link
Author

Hi again,

unfortunately, I'm still getting the same error. I uploaded the VCF and reference file I'm using here.. I used the command mentioned above and I'm running version 2.0.7 of Kage.

I hope this is helpful and sorry that the data is so big - I tried reproducing the error on a smaller subset of this VCF, but unfortunately I wasn't successful so far (so the smaller regions I tested ran without errors).

Thanks a lot for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0