8000 Running sequence alignment with alphabets with more than 256 characters · Issue #3341 · seqan/seqan3 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Running sequence alignment with alphabets with more than 256 characters #3341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
andrestiraboschieclypsium opened this issue Feb 19, 2025 · 3 comments · May be fixed by #3342
Open

Running sequence alignment with alphabets with more than 256 characters #3341

andrestiraboschieclypsium opened this issue Feb 19, 2025 · 3 comments · May be fixed by #3342
Labels
question a user question how to do certain things

Comments

@andrestiraboschieclypsium
Copy link
andrestiraboschieclypsium commented Feb 19, 2025

Platform

  • SeqAn version: 3.4.0
  • Operating system: Linux
  • Compiler: g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0

Question

Hi,
Is it possible to compute sequence alignments suing alphabets larger than 256 characters?

For instance I tried running one of the examples from the tutorials using an alphabet with more than 256 characters I defined like this:

class example_alphabet : public seqan3::alphabet_base<example_alphabet, 1333, char16_t>
{
.
.
.
};

and when building this code:

    // Invoke the pairwise alignment which returns a lazy range over alignment results.
    auto results_example = seqan3::align_pairwise(std::tie(example_alphabet_vector_1, example_alphabet_vector_2), config);
    auto & res_example = *results_example.begin();
    seqan3::debug_stream << "Score: " << res_example.score() << '\n';
    return 0;

I get these kind of errors:

/home/eclypsium/Workspace/v2d/seqan3/tutorial/seqan3/include/seqan3/alphabet/composite/alphabet_variant.hpp:134:25: error: static assertion failed: The alphabet_variant is currently only tested for alphabets with char_type char. Contact us on GitHub if you have a different use case: https://github.com/seqan/seqan3 .
  134 |     static_assert((std::is_same_v<alphabet_char_t<alternative_types>, char> && ...),
      | 

Looking at the code it seems that char as is hardcoded in many places as char_type.
Is there any way to circumvent this?

Best regards,
Andrés Tiraboschi

@andrestiraboschieclypsium andrestiraboschieclypsium added the question a user question how to do certain things label Feb 19, 2025
@eseiler
Copy link
Member
eseiler commented Feb 19, 2025

Hey there,

We had a similar issue in #3271.

Since your alphabet type would fit in a char16_t, it should be possible to modify alphabet_variant to allow for this.

I will try it out tomorrow.

@eseiler
Copy link
Member
eseiler commented Feb 20, 2025

When I apply this patch

Click to show
diff --git a/include/seqan3/alphabet/composite/alphabet_variant.hpp b/include/seqan3/alphabet/composite/alphabet_variant.hpp
index 82b035a99..df411d921 100644
--- a/include/seqan3/alphabet/composite/alphabet_variant.hpp
+++ b/include/seqan3/alphabet/composite/alphabet_variant.hpp
@@ -121,18 +121,22 @@ template <typename... alternative_types>
     requires (detail::writable_constexpr_alphabet<alternative_types> && ...) && (std::regular<alternative_types> && ...)
           && (sizeof...(alternative_types) >= 2)
 class alphabet_variant :
-    public alphabet_base<alphabet_variant<alternative_types...>,
-                         (static_cast<size_t>(alphabet_size<alternative_types>) + ...),
-                         char>
+    public alphabet_base<
+        alphabet_variant<alternative_types...>,
+        (static_cast<size_t>(alphabet_size<alternative_types>) + ...),
+        std::conditional_t<(std::same_as<alphabet_char_t<alternative_types>, char> && ...), char, char16_t>>
 {
 private:
     //!\brief The base type.
-    using base_t = alphabet_base<alphabet_variant<alternative_types...>,
-                                 (static_cast<size_t>(alphabet_size<alternative_types>) + ...),
-                                 char>;
-
-    static_assert((std::is_same_v<alphabet_char_t<alternative_types>, char> && ...),
-                  "The alphabet_variant is currently only tested for alphabets with char_type char. "
+    using base_t = alphabet_base<
+        alphabet_variant<alternative_types...>,
+        (static_cast<size_t>(alphabet_size<alternative_types>) + ...),
+        std::conditional_t<(std::same_as<alphabet_char_t<alternative_types>, char> && ...), char, char16_t>>;
+
+    static_assert(((std::is_same_v<alphabet_char_t<alternative_types>, char>
+                    || std::is_same_v<alphabet_char_t<alternative_types>, char16_t>)
+                   && ...),
+                  "The alphabet_variant is currently only tested for alphabets with char_type char or char16_t. "
                   "Contact us on GitHub if you have a different use case: https://github.com/seqan/seqan3 .");
 
     //!\brief Befriend the base type.

It seems to work just fine

Click to show
#include <seqan3/alignment/pairwise/align_pairwise.hpp>
#include <seqan3/alphabet/alphabet_base.hpp>
#include <seqan3/core/debug_stream.hpp>

namespace example
{

class example_alphabet : public seqan3::alphabet_base<example_alphabet, 1333, char16_t>
{
    using base_t = seqan3::alphabet_base<example_alphabet, 1333, char16_t>;

public:
    using base_t::base_t;

    static constexpr char16_t rank_to_char(rank_type const rank)
    {
        return static_cast<char16_t>(rank);
    }

    static constexpr rank_type char_to_rank(char16_t const chr)
    {
        return static_cast<rank_type>(chr);
    }
};

inline namespace literals
{

constexpr example_alphabet operator""_example(char const c) noexcept
{
    return example_alphabet{}.assign_char(c);
}

constexpr std::vector<example_alphabet> operator""_example(char const * const s, size_t const n)
{
    std::vector<example_alphabet> r;
    r.resize(n);

    for (size_t i = 0; i < n; ++i)
        r[i].assign_char(s[i]);

    return r;
}

} // namespace literals

} // namespace example

int main()
{
    using namespace example::literals;

    std::vector<example::example_alphabet> seq1 = "ACGTGATG!!@@++"_example;
    std::vector<example::example_alphabet> seq2 = "AGTGATACT!!@@++"_example;

    seqan3::configuration cfg = seqan3::align_cfg::method_global{} | seqan3::align_cfg::edit_scheme;
    auto results_example = seqan3::align_pairwise(std::tie(seq1, seq2), cfg);

    auto & res_example = *results_example.begin();
    seqan3::debug_stream << "Score: " << res_example.score() << '\n';

    // char16_t cannot be printed directly, so we need to convert it to char.
    auto adaptor = std::views::transform(
        [](auto const & in)
        {
            auto letter = seqan3::to_char(in);
            return static_cast<char>(letter);
        });

    auto && [p1, p2] = res_example.alignment();
    seqan3::debug_stream << adaptor(p1) << '\n';
    seqan3::debug_stream << adaptor(p2) << '\n';

    // Score: -4
    // ACGTGATG--!!@@++
    // A-GTGATACT!!@@++
}

Seems like alphabet_variant is the only gatekeeper. All other parts are generic and use the rank/char type of the alphabet.

@eseiler eseiler linked a pull request Feb 20, 2025 that will close this issue
@eseiler eseiler linked a pull request Feb 20, 2025 that will close this issue
@andrestiraboschieclypsium
Copy link
Author

Cool! Thanks I'll give it a try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question a user question how to do certain things
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants
0