Open
Description
Describe the Bug
I am collecting inference traces using the suggested calls to PyTorch profiler and am attempting to convert them using the latest code available for Chakra. Due to the addition of HTA, the trace linker now seems to rely on ProfilerStep annotations in the traces, otherwise the linking process will fail.
Steps to Reproduce
- Collect traces using
torch.profiler.profile
, making use ofprofiler.start()
andprofiler.stop()
but notprofiler.step()
- Attempt linking of traces using
chakra_trace_link
Expected Behavior
- A linked trace is created
Screenshots
Log output of chakra_trace_link
:
WARNING:hta:Overall parsing of /home/.../PyCharmProjects/chakra/tests/data/new/device_trace.json in 1.24 seconds; current PID:206409
WARNING:hta:leaving parse_multiple_ranks duration=1.31 seconds
WARNING:hta:leaving parse_traces duration=1.31 seconds
WARNING:hta:ProfilerStep not found in the trace. The analysis result may not be accurate.
WARNING:hta:Trace does not contain CUDA Synchronization events so the results of analysis could be inaccurate.
WARNING:hta:Please see this PR to learn how to enable CUDA sync events https://github.com/pytorch/pytorch/pull/105187
ERROR:hta:Could not find annotation ProfilerStep in the trace.
Traceback (most recent call last):
File "/home/.../PyCharmProjects/chakra/.venv/bin/chakra_trace_link", line 8, in <module>
sys.exit(main())
~~~~^^
File "/home/.../PyCharmProjects/chakra/.venv/lib/python3.13/site-packages/chakra/src/trace_link/trace_link.py", line 47, in main
linker.link(args.rank, args.chakra_host_trace, args.chakra_device_trace, args.output_file)
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/.../PyCharmProjects/chakra/.venv/lib/python3.13/site-packages/chakra/src/trace_link/trace_linker.py", line 74, in link
sync_deps = self.load_sync_dependencies(rank, chakra_device_trace)
File "/home/.../PyCharmProjects/chakra/.venv/lib/python3.13/site-packages/chakra/src/trace_link/trace_linker.py", line 125, in load_sync_dependencies
cp_graph, success = trace_analysis.critical_path_analysis(
^^^^^^^^^^^^^^^^^
Metadata
Metadata
Assignees
Labels
No labels