Description
In order to get the example python demo/qwen3/demo.py
running I followed the Quick Installation section but had to do something more like:
git clone --recursive --branch mpk https://www.github.com/mirage-project/mirage
cd mirage
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
pip install transformers mpi4py google protobuf google protobuf3 tg4perfetto
pip install -e . -v
export MIRAGE_HOME=$pwd
This is with Python 3.10 on Ubuntu 22.04 in WSL2 (without conda). After that I was able to run python demo/qwen3/demo.py
and python demo/qwen3/demo.py --use-mirage
and see the reduced per-token latency from compilation.
There does still seem to be some failure with traces from tg4perfetto
when I try python demo/qwen3/demo.py --use-mirage --profiling
though. There was a release of tg4perfetto just yesterday. Neither 0.0.4 nor 0.0.6 worked in my case. Maybe it need to be installed with a particular git commit id?
Here are the errors from some attempts:
Version 0.0.4
$ pip install tg4perfetto==0.0.4
Collecting tg4perfetto==0.0.4
Using cached tg4perfetto-0.0.4-py3-none-any.whl (208 kB)
Installing collected packages: tg4perfetto
Successfully installed tg4perfetto-0.0.4
$ python demo/qwen3/demo.py --use-mirage --profiling
Input arguments: Namespace(use_mirage=True, profiling=True)
world_size(1) rank(0)
Loading checkpoint shards: 100%|███████████████████████████████████| 5/5 [00:17<00:00, 3.44s/it]
Triggered events: 531
Executed tasks: 12002
Triggered events: 531
Executed tasks: 12002
Compiling megakernel using the following command line:
['/usr/local/cuda-12.6/bin/nvcc', '/tmp/tmpw913bysw/test.cu', '-O3', '-I/usr/include/python3.10', '-I/home/.../mirage/python/mirage/../../include', '-I/home/.../mirage/python/mirage/../../include/mirage/persistent_kernel', '-I/home/.../mirage/python/mirage/../../deps/cutlass/include', '-Ideps/json/include', '-arch=native', '-shared', '-std=c++17', '-rdc=true', '-use_fast_math', '-Xcompiler=-fPIC', '--expt-relaxed-constexpr', '-o', '/tmp/tmpw913bysw/test.cpython-38-x86_64-linux-gnu.so']
Finished megakernel compilation...
[SCHD] sched_id(37) first_worker(74) last_worker(76)
[SCHD] sched_id(16) first_worker(32) last_worker(34)
...
[SCHD] sched_id(35) first_worker(70) last_worker(72)
Finished Launch Persistent Kernel
Traceback (most recent call last):
File "/home/.../mirage/demo/qwen3/demo.py", line 436, in <module>
mpk()
File "/home/.../mirage/python/mirage/persistent_kernel.py", line 591, in __call__
from .profiler_persistent import export_to_perfetto_trace
File "/home/.../mirage/python/mirage/profiler_persistent.py", line 9, in <module>
from tg4perfetto import TraceGenerator
File "/home/.../mirage/.venv/lib/python3.10/site-packages/tg4perfetto/__init__.py", line 1, in <module>
from ._tgen import TraceGenerator
File "/home/.../mirage/.venv/lib/python3.10/site-packages/tg4perfetto/_tgen.py", line 1, in <module>
from ._core import _BaseTraceGenerator
File "/home/.../mirage/.venv/lib/python3.10/site-packages/tg4perfetto/_core.py", line 1, in <module>
from . import perfetto_trace_pb2 as pb2
File "/home/.../mirage/.venv/lib/python3.10/site-packages/tg4perfetto/perfetto_trace_pb2.py", line 32, in <module>
_descriptor.EnumValueDescriptor(
File "/home/.../mirage/.venv/lib/python3.10/site-packages/google/protobuf/descriptor.py", line 933, in __new__
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
Latest version (0.0.6):
$ pip uninstall tg4perfetto
Found existing installation: tg4perfetto 0.0.4
Uninstalling tg4perfetto-0.0.4:
...
Proceed (Y/n)?
Successfully uninstalled tg4perfetto-0.0.4
$ pip install tg4perfetto
Collecting tg4perfetto
Using cached tg4perfetto-0.0.6-py3-none-any.whl
Requirement already satisfied: protobuf in ./.venv/lib/python3.10/site-packages (from tg4perfetto) (6.31.1)
Installing collected packages: tg4perfetto
Successfully installed tg4perfetto-0.0.6
$ python demo/qwen3/demo.py --use-mirage --profiling
Input arguments: Namespace(use_mirage=True, profiling=True)
world_size(1) rank(0)
Loading checkpoint shards: 100%|███████████████████████████████████| 5/5 [00:18<00:00, 3.70s/it]
Triggered events: 531
Executed tasks: 12002
Triggered events: 531
Executed tasks: 12002
Compiling megakernel using the following command line:
['/usr/local/cuda-12.6/bin/nvcc', '/tmp/tmps8s3m7k1/test.cu', '-O3', '-I/usr/include/python3.10', '-I/home/.../mirage/python/mirage/../../include', '-I/home/.../mirage/python/mirage/../../include/mirage/persistent_kernel', '-I/home/.../mirage/python/mirage/../../deps/cutlass/include', '-Ideps/json/include', '-arch=native', '-shared', '-std=c++17', '-rdc=true', '-use_fast_math', '-Xcompiler=-fPIC', '--expt-relaxed-constexpr', '-o', '/tmp/tmps8s3m7k1/test.cpython-38-x86_64-linux-gnu.so']
Finished megakernel compilation...
[SCHD] sched_id(19) first_worker(38) last_worker(40).
..
[SCHD] sched_id(30) first_worker(60) last_worker(62)
[SCHD] sched_id(35) first_worker(70) last_worker(72)
Finished Launch Persistent Kernel
Traceback (most recent call last):
File "/home/.../mirage/demo/qwen3/demo.py", line 436, in <module>
mpk()
File "/home/.../mirage/python/mirage/persistent_kernel.py", line 591, in __call__
from .profiler_persistent import export_to_perfetto_trace
File "/home/.../mirage/python/mirage/profiler_persistent.py", line 9, in <module>
from tg4perfetto import TraceGenerator
File "/home/.../mirage/.venv/lib/python3.10/site-packages/tg4perfetto/__init__.py", line 1, in <module>
from ._tgen import TraceGenerator
File "/home/.../mirage/.venv/lib/python3.10/site-packages/tg4perfetto/_tgen.py", line 1, in <module>
from ._core import _BaseTraceGenerator
File "/home/.../mirage/.venv/lib/python3.10/site-packages/tg4perfetto/_core.py", line 1, in <module>
from . import perfetto_trace_pb2 as pb2
File "/home/.../mirage/.venv/lib/python3.10/site-packages/tg4perfetto/perfetto_trace_pb2.py", line 33, in <module>
_descriptor.EnumValueDescriptor(
File "/home/.../mirage/.venv/lib/python3.10/site-packages/google/protobuf/descriptor.py", line 933, in __new__
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
Trying the PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
recommendation:
$ PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python python demo/qwen3/demo.py --use-mirage --profiling
Input arguments: Namespace(use_mirage=True, profiling=True)
world_size(1) rank(0)
Loading checkpoint shards: 100%|███████████████████████████████████| 5/5 [00:03<00:00, 1.27it/s]
Triggered events: 531
Executed tasks: 12002
Triggered events: 531
Executed tasks: 12002
Compiling megakernel using the following command line:
['/usr/local/cuda-12.6/bin/nvcc', '/tmp/tmplzj2xwtr/test.cu', '-O3', '-I/usr/include/python3.10', '-I/home/.../mirage/python/mirage/../../include', '-I/home/.../mirage/python/mirage/../../include/mirage/persistent_kernel', '-I/home/.../mirage/python/mirage/../../deps/cutlass/include', '-Ideps/json/include', '-arch=native', '-shared', '-std=c++17', '-rdc=true', '-use_fast_math', '-Xcompiler=-fPIC', '--expt-relaxed-constexpr', '-o', '/tmp/tmplzj2xwtr/test.cpython-38-x86_64-linux-gnu.so']
Finished megakernel compilation...
[SCHD] sched_id(15) first_worker(30) last_worker(32)
[SCHD] sched_id(18) first_worker(36) last_worker(38)
...
[SCHD] sched_id(16) first_worker(32) last_worker(34)
Finished Launch Persistent Kernel
Traceback (most recent call last):
File "/home/.../mirage/demo/qwen3/demo.py", line 436, in <module>
mpk()
File "/home/.../mirage/python/mirage/persistent_kernel.py", line 593, in __call__
export_to_perfetto_trace(
File "/home/.../mirage/python/mirage/profiler_persistent.py", line 83, in export_to_perfetto_trace
event = event_name_list[event_idx] + f"_{event_no}"
KeyError: 0
The trace file mirage_0.perfetto-trace
is created but I don't know what's in it exactly.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status