Open
Description
Description
Hi, mirage team!
Awesome work! I am already done with your Qwen3 demo. Now I'm on parallel inference. Could u tell me how I can load my compiled mirage kernel to multiple GPUs? Is it compatible with torch.distributed? And what should I, or what will u do if a model is too large to compile into a mega-kernel if my cuda memory is limited.