8000 Is AirLLM faster than llama.cpp? · Issue #206 · lyogavin/airllm · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Is AirLLM faster than llama.cpp? #206
Open
@Lizonghang

Description

@Lizonghang

Dear Lyogavin,

Thanks for your wonderful work. I have a question about, does AirLLM run faster than llama.cpp? Do you have any data on that?

As I know, llama.cpp uses mmap to manage memory. When computation meets page faults, mmap automatically loads tensor weights from disk to memory and continue computation, and it also unloads less-used tensor weights when the memory load is high, all managed by the OS. So llama.cpp also supports very large LLMs, like the feature AirLLM provides.

I noticed that AirLLM uses prefetching to overlap disk IO latency and computation, will this be faster than llama.cpp (with mmap enabled)? And how much is the improvement?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0