-
Notifications
You must be signed in to change notification settings - Fork 24.3k
Profiler with Kineto has "orphan+childless" function events (on P100) #54267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for reporting. We currently do include GPU events that have no corresponding CPU event by default. Let me take a look at where that comes from in this particular case. |
@gdankel Thanks for investigating. My confusion stems from the fact that I'm wrapping my entire profiled code in a with tprofiler.profile(model, use_cuda=True, use_kineto=True) as prof:
with tprofiler.record_function('Overall'):
output = model(input_batch)
torch.cuda.synchronize() |
I believe that might just not be true anymore, we treat on-device events as a separate class of events, not in the CPU hierarchy, as they are not executed on CPU. We do though save the information (correlation id) to associate on-device events with CPU events |
We have basic docs for the new profiler but we'll make sure to extend the existing tutorial to cover it for 1.9 release (1.8 release - experimental preview of the new profiler) |
@gdankel @ilia-cher I think I'm still a bit confused: is this a bug or not? What does it mean that these GPU events don't belong within a CPU event, especially the top-level |
the purpose of |
Closing old issue that has an answer. |
Uh oh!
There was an error while loading. Please reload this page.
🐛 Bug
With the
use_kineto=True
flag on a P100 gpu, the torch profiler returns some FunctionEvents that have neither a parent nor children.For AlexNet, here are the names of these events:
To Reproduce
https://colab.research.google.com/drive/1kiOtdCilQ96lM_3WhT_A14PGOjEGluKE#scrollTo=pV6sSyiDqVP-
Expected behavior
I had expected the set of events to be a tree with a single root, especially if there is a top-level
record_function
and we are doingtorch.cuda.synchronize
. This has been the case withuse_kineto=False
in my experience. Perhaps my mental model is incorrect, in which case please point me to any documentation about this -- I have not been able to find it.Environment
Additional context
On a K80, there was only a single
aten::dropout
event that showed up in this list.cc @ilia-cher @gdankel @ngimel
The text was updated successfully, but these errors were encountered: