You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "/opt/laava/summarize_alignment.py", line 1032, in <module>
main(args)
File "/opt/laava/summarize_alignment.py", line 896, in main
subset_sam_by_readname_list(
File "/opt/laava/summarize_alignment.py", line 55, in subset_sam_by_readname_list
for row in csv.DictReader(per_read_f, delimiter="\t"):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
gzip.BadGzipFile: Not a gzipped file (b're')
That's because the .gz is only applied when cpus > 1, and cpus=1 follows a different code path and skips the aggregation+gzip steps.
Potential solutions:
Always use the multiprocessing path, even when cpus=1. (Least code, though inefficient.)
Fix the downstream issue(s) individually by checking for .gz extensions. (Perpetuates the inconsistency.)
Gzip the intermediate "chunks" as well, so that they are also valid .tsv.gz, and handle them correctly in the aggregation step when cpus>1. (Requires more code changes with little benefit.)
Run gzip directly on the generated .tsv files when cpus=1. (Straightforward but requires more special-case code.)
The first option seems best because all this chunking and iteration deserves to be rewritten and having less code is better for that.
The text was updated successfully, but these errors were encountered:
Observed error:
That's because the .gz is only applied when cpus > 1, and cpus=1 follows a different code path and skips the aggregation+gzip steps.
Potential solutions:
The first option seems best because all this chunking and iteration deserves to be rewritten and having less code is better for that.
The text was updated successfully, but these errors were encountered: