10000 With cpus=1, handle TSV post-processing safely · Issue #90 · formbio/laava · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

With cpus=1, handle TSV post-processing safely #90

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms o A0B9 f service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
etal opened this issue Feb 27, 2025 · 0 comments
Open

With cpus=1, handle TSV post-processing safely #90

etal opened this issue Feb 27, 2025 · 0 comments
Labels
bug Something isn't working
Milestone

Comments

@etal
Copy link
Contributor
etal commented Feb 27, 2025

Observed error:

  Traceback (most recent call last):
    File "/opt/laava/summarize_alignment.py", line 1032, in <module>
      main(args)
    File "/opt/laava/summarize_alignment.py", line 896, in main
      subset_sam_by_readname_list(
    File "/opt/laava/summarize_alignment.py", line 55, in subset_sam_by_readname_list
      for row in csv.DictReader(per_read_f, delimiter="\t"):
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
  gzip.BadGzipFile: Not a gzipped file (b're')

That's because the .gz is only applied when cpus > 1, and cpus=1 follows a different code path and skips the aggregation+gzip steps.

Potential solutions:

  • Always use the multiprocessing path, even when cpus=1. (Least code, though inefficient.)
  • Fix the downstream issue(s) individually by checking for .gz extensions. (Perpetuates the inconsistency.)
  • Gzip the intermediate "chunks" as well, so that they are also valid .tsv.gz, and handle them correctly in the aggregation step when cpus>1. (Requires more code changes with little benefit.)
  • Run gzip directly on the generated .tsv files when cpus=1. (Straightforward but requires more special-case code.)

The first option seems best because all this chunking and iteration deserves to be rewritten and having less code is better for that.

@etal etal added the bug Something isn't working label Feb 27, 2025
@etal etal added this to the Next milestone Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant
0