8000
We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
release more efficient look_around thanks to @minh-nguyenhoang !
hyper connect the local transformer
allow for inserting layers before the local attention, for testing a … …new neural memory paper
remove a warning
make sure kv caching with dynamic pos bias works for local transformer
remove adding staticmethod
oops
revert back to using plain apply_rotary_pos_emb fn
allow for argmax sampling