Usage of Vector API in Jlama #19
anbusampath
started this conversation in
General
Replies: 1 comment 1 reply
-
Hi, LLMs use many matrix multiplications. In fact it's where 90% of the processing time goes when running inference. You can run a matrix multiplication on a CPU with plain old java loops. Or you can run them using Vector api to do it faster. You can also run it on a GPU. You can see two implementations in Jlama. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am new to LLMs, My understanding is that Java's Vector API used for SIMD instruction to auto-vectorization for different machine architecture. Where does SIMD used in Jlama? Because LLM uses embedding models, to communicate(input/output) with LLM why do we need to vectors.
Beta Was this translation helpful? Give feedback.
All reactions