[Expected output] Qwen3 demo

Description

Hi, Mirage Team!
Thanks for your awesome work! I've cloned your repo to my server with a single RTX 4090 and run the Qwen3-8B demo as: python demo/qwen3/demo.py and python demo/qwen3/demo.py --use-mirage respectively. According to the output, it takes PyTorch about 25ms per token while Mirage cost 15 ms (40% speed up). My questions are:(I don't make any change to your code)

Is this the expected speed and acceleration rate?
Does the PyTorch model have any optimization?
In your demo, the compilation only costs about 13s(this is because CUDA Cores have been pre-compiled and stored in the core.so in folder python I guess). If I want to start a computation graph search, what should I do?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions