-
Notifications
You must be signed in to change notification settings - Fork 160
Insights: fla-org/flash-linear-attention
Overview
-
- 2 Merged pull requests
- 0 Open pull requests
- 73D3 2 Closed issues
- 0 New issues
Could not load contribution data
Please try again later
2 Pull requests merged by 2 people
-
[RWKV7] Strictly initialize rwkv7 according to RWKV-LM
#387 merged
May 3, 2025 -
[Utils] Add fused pack/unpack fns
#386 merged
May 2, 2025
2 Issues closed by 2 people
-
[Bug] FLA models fail to return logits when labels are not provided
#237 closed
May 4, 2025
7 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[RFC] Implement Triton Version of Token Shift and ShortConv with varlen support
#319 commented on
May 3, 2025 • 0 new comments -
[Bug] `GLA` fails when upgrade to `triton-nightly`
#288 commented on
May 4, 2025 • 0 new comments -
DeepSeek initialization underperforms when the learning rate is increased for RoPE-based transformer
#266 commented on
May 4, 2025 • 0 new comments -
[RFC] Make all kernels compatible with Triton3.2 on H100 GPUs
#252 commented on
May 4, 2025 • 0 new comments -
[Misc.] Add activations for non-cuda Backends
#174 commented on
May 2, 2025 • 0 new comments -
[Linear Attention] Update fused_recurrent.py for inference with nomalization=true
#268 commented on
May 2, 2025 • 0 new comments -
Use `tl.exp2` for all gating operations
#361 commented on
May 2, 2025 • 0 new comments