8000 Import and refactor trace_link.py by TaekyungHeo · Pull Request #47 · mlcommons/chakra · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Import and refactor trace_link.py #47

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
May 9, 2024
Merged

Import and refactor trace_link.py #47

merged 24 commits into from
May 9, 2024

Conversation

TaekyungHeo
Copy link
Contributor
@TaekyungHeo TaekyungHeo commented May 8, 2024

Summary

Import and refactor trace_link.py

Test Plan

1. Run trace_link

chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_0.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_0.json --output-file ~/megatron_0.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_1.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_1.json --output-file ~/megatron_1.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_2.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_2.json --output-file ~/megatron_2.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_3.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_3.json --output-file ~/megatron_3.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_4.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_4.json --output-file ~/megatron_4.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_5.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_5.json --output-file ~/megatron_5.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_6.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_6.json --output-file ~/megatron_6.json &
chakra_trace_link --pytorch-et-file /Users/theo/Downloads/llama_pytorch24.05/megatron_et_7.json --kineto-file /Users/theo/Downloads/llama_pytorch24.05/megatron_kineto_7.json --output-file ~/megatron_7.json &

2. Run et_converter

chakra_converter --input_filename ~/megatron_0.json --output_filename megatron_0.chakra --input_type PyTorch > /tmp/rank_0 &
chakra_converter --input_filename ~/megatron_1.json --output_filename 
10000
megatron_1.chakra --input_type PyTorch > /tmp/rank_1 &
chakra_converter --input_filename ~/megatron_2.json --output_filename megatron_2.chakra --input_type PyTorch > /tmp/rank_2 &
chakra_converter --input_filename ~/megatron_3.json --output_filename megatron_3.chakra --input_type PyTorch > /tmp/rank_3 &
chakra_converter --input_filename ~/megatron_4.json --output_filename megatron_4.chakra --input_type PyTorch > /tmp/rank_4 &
chakra_converter --input_filename ~/megatron_5.json --output_filename megatron_5.chakra --input_type PyTorch > /tmp/rank_5 &
chakra_converter --input_filename ~/megatron_6.json --output_filename megatron_6.chakra --input_type PyTorch > /tmp/rank_6 &
chakra_converter --input_filename ~/megatron_7.json --output_filename megatron_7.chakra --input_type PyTorch > /tmp/rank_7 &

3. Results
Screenshot 2024-05-08 at 7 42 27 PM

@TaekyungHeo TaekyungHeo requested a review from a team as a code owner May 8, 2024 21:35
Copy link
github-actions bot commented May 8, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

TaekyungHeo added 18 commits May 8, 2024 18:51
The `handle_kineto_segmentation` function is intended to support kineto traces
cross multiple iterations by splitting a trace into several segments according
to the provided annotations. Unfortunately, this function is not operating as
expected, leading to errors. It is advisable to remove it.
The multi-iteration support feature for PyTorch execution traces is designed to
facilitate the handling of traces over multiple iterations. Unfortunately, this
feature is not functioning as expected and is leading to errors. It is advisable
to remove it.
This commit introduces support for inter-thread dependencies within the Chakra
framework. By examining Kineto traces via chrome://tracing, one can observe
multiple CPU threads and their implicit dependencies. This update explicitly
encodes these dependencies in the output trace, enabling accurate handling by
subsequent tools.
This commit adds stream ID encoding to GPU operators. This ensures that all
operators within the same stream are executed in the correct order, supporting
intra-stream dependencies.
Introduced exclusive duration calculation for Kineto operators in the TraceLinker
class.  This update differentiates between inclusive and exclusive durations,
providing a clearer distinction in the profiling data. Exclusive durations are
now calculated to identify the actual time spent in individual operations,
excluding overlaps with child operators.
@TaekyungHeo TaekyungHeo force-pushed the import-trace-link branch from 7bd7924 to f4026c6 Compare May 8, 2024 22:52
@TaekyungHeo TaekyungHeo changed the title Remove test_trace_link.py Import and refactor trace_link.py May 8, 2024
@TaekyungHeo TaekyungHeo force-pushed the import-trace-link branch 2 times, most recently from 5d33bf9 to 6b6dc29 Compare May 8, 2024 23:03
@TaekyungHeo TaekyungHeo force-pushed the import-trace-link branch from 6b6dc29 to 6a213c9 Compare May 8, 2024 23:04
@srinivas212 srinivas212 merged commit ee34486 into main May 9, 2024
@github-actions github-actions bot locked and limited conversation to collaborators May 9, 2024
@TaekyungHeo TaekyungHeo deleted the import-trace-link branch May 9, 2024 18:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0