8000 [IO] Make `seqan3::sam_file_header::program_info_t` easier to copy · Issue #3137 · seqan/seqan3 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[IO] Make seqan3::sam_file_header::program_info_t easier to copy #3137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tsnorri opened this issue Mar 21, 2023 · 6 comments · Fixed by #3145
Closed

[IO] Make seqan3::sam_file_header::program_info_t easier to copy #3137

tsnorri opened this issue Mar 21, 2023 · 6 comments · Fixed by #3145
Labels
feature/proposal a new feature or an idea of ready to tackle Fully specified, discussed and understood. Can be tackled.

Comments

@tsnorri
Copy link
Contributor
tsnorri commented Mar 21, 2023

When processing alignments in such a way that both the input and the output are SAM/BAM files, one would typically like to append a @PG record to the SAM file headers and, to that end, copy the existing @PG records. Currently this is not very easy under some circumstances as sam_file_header::program_info_t depends on sam_file_header’s template parameter ref_ids_type (i.e. when the input and output files have a different type for ref_ids_type).

Suggested solutions include:

  • Making program_info_t not depend on ref_ids_type by e.g. moving it outside sam_file_header
  • Adding a conversion operator, converting constructor or similar for program_info_t.

(I’m not sure what would be the best approach but once one is chosen, I could volunteer to contribute a patch.)

@tsnorri tsnorri added the feature/proposal a new feature or an idea of label Mar 21, 2023
@tsnorri tsnorri changed the title Make seqan3::sam_file_header::program_info_t easier to copy [IO] Make seqan3::sam_file_header::program_info_t easier to copy Mar 21, 2023
@tsnorri
Copy link
Contributor Author
tsnorri commented Mar 21, 2023

Here’s a suggestion:

  • Move program_info_t outside sam_file_header and rename it to e.g. sam_file_program_info
  • Add typedef sam_file_program_info program_info_t to sam_file_header, possibly with [[deprecated("Please use sam_file_program_info instead")]]

@smehringer
Copy link
Member
smehringer commented Mar 27, 2023

Hi @tsnorri,

thanks for reaching out and sorry for the delay.

You are absolutely right. This copy should be way easier. Let me investigate a little and then get back to you. Thanks in advance for volunteering to help!

This copy (for anyone following the thread):

#include <seqan3/io/sam_file/all.hpp>

int main()
{
    seqan3::sam_file_input in{"in.sam"};
    seqan3::sam_file_output out{"out.sam"};

    auto it = in.begin(); // cache first record which also reads the header

    out.header().program_infos = in.header().program_infos; // error :(
}

https://godbolt.org/z/3WrKsMMsb

@smehringer
Copy link
Member
smehringer commented Mar 27, 2023

Hi @tsnorri,

after a quick dive into the code, we agree with your suggestion.

The tasks would be:

  • Move the program_info_t from the class scope to the outer scope (right above the seqan3::sam_file_head in the same file) and rename it to sam_file_program_info_t (within the seqan3 namespace)
  • add a using program_info_t = sam_file_program_info_t; within the seqan3::sam_file_header class
  • Add a unit test in test/unit/io/sam_file/sam_file_output_test.cpp
    somehting like : (not tested)
    TEST(header, copy_program_info_t)
    {
      std::string comp =
          R"(@HD	VN:1.6	SO:unknown	GO:none
    @SQ	SN:ref	LN:26
    @PG	ID:prog1	PN:cool_program
    @CO	This is a comment.
    read1	41	ref	1	61	1S1M1D1M1I	ref	10	300	ACGT	!##$	AS:i:2	NM:i:7
    )";
        seqan3::sam_file_input fin{std::istringstream{comp}, seqan3::format_sam{}};
        seqan3::sam_file_output fout{std::ostringstream{}, seqan3::format_sam{}};
    
        auto it = in.begin(); // cache first record which also reads the header
    
        fout.header().program_infos = fin.header().program_infos; 
    }

It would be great if you can tackle this. Otherwise just drop a note that you have no time and we can work on this.

Best,
Svenja

@smehringer smehringer added the ready to tackle Fully specified, discussed and understood. Can be tackled. label Mar 27, 2023
tsnorri added a commit to tsnorri/seqan3 that referenced this issue Apr 20, 2023
tsnorri added a commit to tsnorri/seqan3 that referenced this issue Apr 20, 2023
tsnorri added a commit to tsnorri/seqan3 that referenced this issue Apr 21, 2023
@tsnorri
Copy link
Contributor Author
tsnorri commented Apr 21, 2023

Hi @smehringer and sorry about the delay! I finally had some time to fix the issue; the pull request above should have everything needed.

Best,
Tuukka

@tsnorri
Copy link
Contributor Author
tsnorri commented Apr 21, 2023

Actually scratch that. I just realised that the test I wrote did not actually test the intended functionality. I’ll re-refine it.

tsnorri added a commit to tsnorri/seqan3 that referenced this issue Apr 21, 2023
- The purpose of the test is to check that copying headers works in a situation where ref_id_types of the source and the destination differ.
- See seqan#3137.
@tsnorri
Copy link
Contributor Author
tsnorri commented Apr 21, 2023

I had noticed earlier but forgotten that the solution to this may not be the one that I first thought was the obvious one. If ref_ids_type in the destination is ref_info_not_given, then calling header() will not work, which does make sense. Consequently the easiest way to append a program_info_t is to copy the source header, modify it, and finally reset the header pointer in the records. I think that worked even without decoupling program_info_t, though? Modifying the header object of the output file seemed to be the intended approach at first but I guess either is fine.

In any case, now program_info_t does not depend on sam_file_header’s template parameter and there is a test that checks that copying is possible.

eseiler pushed a commit to tsnorri/seqan3 that referenced this issue Apr 24, 2023
eseiler pushed a commit to tsnorri/seqan3 that referenced this issue Apr 24, 2023
- The purpose of the test is to check that copying headers works in a situation where ref_id_types of the source and the destination differ.
- See seqan#3137.
eseiler pushed a commit to tsnorri/seqan3 that referenced this issue Apr 24, 2023
eseiler pushed a commit to tsnorri/seqan3 that referenced this issue Apr 24, 2023
eseiler pushed a commit to tsnorri/seqan3 that referenced this issue Apr 24, 2023
eseiler pushed a commit to tsnorri/seqan3 that referenced this issue Apr 24, 2023
- The purpose of the test is to check that copying headers works in a situation where ref_id_types of the source and the destination differ.
- See seqan#3137.
eseiler pushed a commit to tsnorri/seqan3 that referenced this issue Apr 24, 2023
eseiler pushed a commit that referenced this issue Apr 25, 2023
* [MISC] Make seqan3::sam_file_header::program_info_t easier to copy

See #3137
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/proposal a new feature or an idea of ready to tackle Fully specified, discussed and understood. Can be tackled.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants
0