Open
Description
Very insightful work! However, since the original MLA uses the same down-projection matrix and latent vector for KV, I am curious about the performance impactions of transferring GQA to MLA in such ways.
Have the authors explored this adaptation? If so, could you share any insights or findings on how it impacts performance? 👀
Metadata
Metadata
Assignees
Labels
No labels