8000 LM slower than the encoder-decoder with the same depth and max_seq_len, window size · Issue #20 · lucidrains/routing-transformer · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
LM slower than the encoder-decoder with the same depth and max_seq_len, window size #20
Open
@AliOskooeiTR

Description

@AliOskooeiTR

This is more of a question for sanity check than an issue. I have trained the routing transformer encoder-decoder in the past and was really impressed by the speed. I hot about 4 iter/sec training on 12000 long sequences. Now I am training a language model with a depth equal to the encoder/decoder depth of my old model and keeping all other parameters the same. The training rate for the LM has fallen below 1 iter/sec. I was wondering if this is to be expected or there may be something wrong that I need to look into. Thank you for your help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0