8000 Handling effects of setting containment_distance to 1 · Issue #71 · iqbal-lab-org/pling · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Handling effects of setting containment_distance to 1 #71

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitH 8000 ub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eri-lim opened this issue Aug 14, 2024 · 3 comments · Fixed by #72
Closed

Handling effects of setting containment_distance to 1 #71

eri-lim opened this issue Aug 14, 2024 · 3 comments · Fixed by #72

Comments

@eri-lim
Copy link
eri-lim commented Aug 14, 2024

Hi there! Great tool - I set the containment_distance to 1 as I was trying to ensure the least amount of plasmids were filtered off, even highly dissimilar plasmids, since I wanted to see the nuances of the DCJ-Indel distance between such plasmid pairs too. However, when I did so, it resulted in a failure of the Snakemake workflow somewhere during running.

I understand in theory, a containment distance of 1 would result in no possibility of calculating DCJ-Indel distance as the number of operations to transform one to another would not be calculable.

Hence, could it be useful to perhaps provide internal handling of such problematic cases or restrict the maximum allowed containment distance value so that the pipeline would not fail completely?

Thanks so much!

@babayagaofficial
Copy link
Collaborator

Hi, can you please send me the error message you received?

@babayagaofficial
Copy link
Collaborator

Also just to note on the theory of the DCJ-Indel distance between two completely different plasmids, say plasmid A and plasmid B -- their distance would be 2, because you'd basically end up with the following integer sequence representation for the two plasmids:

A: 1
B: 2

You can go from A to B then in two operations: delete 1 from A, and then insert 2.

Basically the distance is mathematically still defined, just biologically nonsensical, hence the motivation for the containment distance threshold.

I suspect I know why the pipeline is failing in this case, but it'll be easier to pin down if you are able to send me whatever error message you received.

@eri-lim
Copy link
Author
eri-lim commented Aug 14, 2024

Thank you for the insight on the theory!

This is a segment of the error message in the Pling output; the message repeats itself for various batches/jobids.

[Thu Aug 15 01:38:01 2024]
Error in rule glpk_and_ding:
    jobid: 0
    input: WORKING_DIR/output/tmp_files/containment_batchwise/batch_2029_containment.tsv
    output: WORKING_DIR/output/tmp_files/dists_batchwise/batch_2029_dcj.tsv
    conda-env: WORKING_DIR/.snakemake/conda/e56ad01a23178879ae50d78c3c74b859_
    shell:
        
                PYTHONPATH=PLING_DIR python PLING_DIR/pling/dcj_snakemake/glpk_and_ding.py                         --batch 2029                         --containment_tsv WORKING_DIR/output/tmp_files/containment_batchwise/batch_2029_containment.tsv                         --containment_distance 1.0                         --outputpath WORKING_DIR/output                         --communitypath WORKING_DIR/output/containment/containment_communities/objects/communities.txt                         --integerisation align                                                  --threads 1                         --snakefile_dir PLING_DIR/pling/dcj_snakemake
                
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message
WorkflowError:
At least one job did not complete successfully.
[Thu Aug 15 01:38:01 2024]
Error in rule glpk_and_ding:
    jobid: 2032
    input: WORKING_DIR/output/tmp_files/containment_batchwise/batch_2029_containment.tsv
    output: WORKING_DIR/output/tmp_files/dists_batchwise/batch_2029_dcj.tsv
    conda-env: WORKING_DIR/.snakemake/conda/e56ad01a23178879ae50d78c3c74b859_
    shell:
        
                PYTHONPATH=PLING_DIR python PLING_DIR/pling/dcj_snakemake/glpk_and_ding.py                         --batch 2029                         --containment_tsv WORKING_DIR/output/tmp_files/containment_batchwise/batch_2029_containment.tsv                         --containment_distance 1.0                         --outputpath WORKING_DIR/output                         --communitypath WORKING_DIR/output/containment/containment_communities/objects/communities.txt                         --integerisation align                                                  --threads 1                         --snakefile_dir PLING_DIR/pling/dcj_snakemake
                
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Traceback (most recent call last):
  File "PLING_DIR/pling/dcj_snakemake/glpk_and_ding.py", line 98, in <module>
    main()
  File "PLING_DIR/pling/dcj_snakemake/glpk_and_ding.py", line 95, in main
    batchwise_ding(pairs, float(args.containment_distance), containments, args.integerisation, args.outputpath, args.batch, timelimit, args.snakefile_dir, plasmid_to_community)
  File "PLING_DIR/pling/dcj_snakemake/glpk_and_ding.py", line 52, in batchwise_ding
    unimog_to_ilp(unimog, lp, entry1, entry2)
  File "PLING_DIR/pling/dcj_snakemake/glpk_and_ding.py", line 15, in unimog_to_ilp
    raise e
  File "PLING_DIR/pling/dcj_snakemake/glpk_and_ding.py", line 10, in unimog_to_ilp
    subprocess.run(f"dingII generate {unimog} -mm --writeilp {lp} -p {genome1} {genome2}", shell=True, check=True, capture_output=True)
  File "WORKING_DIR/.snakemake/conda/e56ad01a23178879ae50d78c3c74b859_/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'dingII generate WORKING_DIR/output/unimogs/batch_2623_align.unimog -mm --writeilp ding/ilp/PLASMID1~PLASMID2.lp -p PLASMID1~PLASMID2:PLASMID1 PLASMID1~PLASMID2:PLASMID2' returned non-zero exit status 1.

@babayagaofficial babayagaofficial mentioned this issue Aug 15, 2024
Merged
5 tasks
babayagaofficial added a commit that referenced this issue Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants
0