Revise input fields and metadata I/O #50

etal · 2024-07-13T02:37:55Z

First step in revising the classification and output TSV schema for v3.0 -- new input fields.

Take ITR labels as input and read from annotation BED to use in classifications. (Add ITR label fields and read ITR coordinates from vector annotation BED #47)
Take helper_name, lambda_name inputs to label packaging sequences correctly. (Take "helper/lambda name" inputs and report unidentified references with their own names in Table 1 #31)
Take sample metadata inputs (Take sample IDs and other metadata as a tabular input (CSV/TSV) #51) and write them, along with the sequencing run ID extracted from the sequencing reads, to per-sample metadata output TSV. (Consolidate output TSVs and remove .Rdata output #29, in part)

- For single-sample input: sample_unique_id, sample_display_name - For multi-sample input: sample_in_metadata Unpack these in the 'laava' named workflow so that single- and multi-sample inputs are equivalent in the downstream processes.

alphabdiallo

We can align on the comments and address them on the next release.

alphabdiallo · 2024-08-05T17:36:41Z

workflows/laava/inputs.md

+    Terminal Repeat (ITR) regions (see itr_label_1 and itr_label_2 below) or, as a
+    legacy mode, one region with the label 'vector', spanning both ITRs (inclusive).
+  - May also include additional labeled regions, e.g. for promoter and CDS regions;
+    these will be ignored and will not affect the output.


We should think about adding a minimal BED file information, in the documentation.

What additional information would you like to see here? An explanation of the BED format?

alphabdiallo · 2024-08-05T17:40:10Z

workflows/laava/workflow.json

      "format": "text",
      "hidden": false,
      "required": false,
-      "default": "ITR",


We should keep default to ITR for both first and second ITR.
What is the rational to have first and second, rather than previously left and right?

From looking at a circular diagram in SnapGene, users might confuse or swap the order of the ITRs in the exported BED/Genbank file. This way, "ITR-R" followed by "ITR-L" will work, even if swapped. As a side effect, entering "ITR" for both itr_label_1 and itr_label_2 works the same as leaving itr_label_2 blank.

Add inputs itr_label_1|2, helper_name, lambda_name (#31, #47)

488706c

etal requested a review from alphabdiallo July 15, 2024 17:20

etal added 2 commits July 15, 2024 10:27

Fix issues identified by ruff (#45)

8dfa202

Use ruff in CI for static checks (closes #45)

f67d627

etal force-pushed the schema3 branch from ad3cc2b to f67d627 Compare July 15, 2024 17:27

etal added 2 commits July 15, 2024 20:46

Allow *.bam folder input

87c034b

Add Nextflow integration test for minimal input: no packaging etc.

e0f5c9b

etal force-pushed the schema3 branch from 0fcf8a5 to 8e7672d Compare July 23, 2024 16:56

etal requested a review from ptn24-formbio July 23, 2024 16:57

etal force-pushed the schema3 branch from 8e7672d to 4c7d3db Compare July 25, 2024 20:54

Add input fields for sample metadata (#51)

5f92649

- For single-sample input: sample_unique_id, sample_display_name - For multi-sample input: sample_in_metadata Unpack these in the 'laava' named workflow so that single- and multi-sample inputs are equivalent in the downstream processes.

etal force-pushed the schema3 branch from 4c7d3db to 5f92649 Compare July 25, 2024 23:20

etal added 4 commits July 25, 2024 17:43

Implement per-sample metadata output (#29, closes #51)

6e14475

prepare_annotation: Take and use input ITR labels

fe9a234

Add get_reference_names.py

3a1bd28

prepare_annotation: Allow nonstandard reference "source type" names

Use get_reference_names.py in Nextflow; fix issues (#47; closes #31)

5e38d16

etal changed the title ~~Revise classification and output TSV schema~~ Revise input fields and metadata I/O Jul 31, 2024

etal marked this pull request as ready for review July 31, 2024 05:43

etal requested review from dougnukem and mcrocker-bioborg July 31, 2024 05:44

Update workflow input documentation

4943b79

etal force-pushed the schema3 branch from 0c3ed25 to 4943b79 Compare July 31, 2024 17:43

prepare_annotation: update docstring to reflect ITR vs 'vector'

56e0ed4

etal requested a review from Magdoll July 31, 2024 23:39

alphabdiallo approved these changes Aug 5, 2024

View reviewed changes

etal merged commit ab844ef into main Aug 6, 2024
3 checks passed

etal deleted the schema3 branch August 6, 2024 14:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revise input fields and metadata I/O #50

Revise input fields and metadata I/O #50

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Revise input fields and metadata I/O #50

Revise input fields and metadata I/O #50

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!