Is AirLLM faster than llama.cpp?

Dear Lyogavin,

Thanks for your wonderful work. I have a question about, does AirLLM run faster than llama.cpp? Do you have any data on that?

As I know, llama.cpp uses mmap to manage memory. When computation meets page faults, mmap automatically loads tensor weights from disk to memory and continue computation, and it also unloads less-used tensor weights when the memory load is high, all managed by the OS. So llama.cpp also supports very large LLMs, like the feature AirLLM provides.

I noticed that AirLLM uses prefetching to overlap disk IO latency and computation, will this be faster than llama.cpp (with mmap enabled)? And how much is the improvement?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions