8000 [Expected output] Qwen3 demo · Issue #362 · mirage-project/mirage · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
8000
[Expected output] Qwen3 demo #362
Open
@zhaotf16

Description

@zhaotf16

Description

Hi, Mirage Team!
Thanks for your awesome work! I've cloned your repo to my server with a single RTX 4090 and run the Qwen3-8B demo as: python demo/qwen3/demo.py and python demo/qwen3/demo.py --use-mirage respectively. According to the output, it takes PyTorch about 25ms per token while Mirage cost 15 ms (40% speed up). My questions are:(I don't make any change to your code)

  1. Is this the expected speed and acceleration rate?
  2. Does the PyTorch model have any optimization?
  3. In your demo, the compilation only costs about 13s(this is because CUDA Cores have been pre-compiled and stored in the core.so in folder python I guess). If I want to start a computation graph search, what should I do?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0