Open
Description
Description
Hi, Mirage Team!
Thanks for your awesome work! I've cloned your repo to my server with a single RTX 4090 and run the Qwen3-8B demo as: python demo/qwen3/demo.py
and python demo/qwen3/demo.py --use-mirage
respectively. According to the output, it takes PyTorch about 25ms per token while Mirage cost 15 ms (40% speed up). My questions are:(I don't make any change to your code)
- Is this the expected speed and acceleration rate?
- Does the PyTorch model have any optimization?
- In your demo, the compilation only costs about 13s(this is because CUDA Cores have been pre-compiled and stored in the core.so in folder python I guess). If I want to start a computation graph search, what should I do?