-
Notifications
You must be signed in to change notification settings - Fork 39
VMProfile 3: Updated profiler data collection #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Define enums for implicit VM states that make sense even though they are never explicitly stored in the g->vmstate field: jit head, jit loop, jit garbage collection, and ffi.
The VM now keeps track of the previous vmstate from before the most recent exit to the interpreter. The field is global_State.lasttrace. This is intended to help with profiling and diagnostics ("which trace is to blame for all these expensive exits into the interpreter?")
This is an extension to the data collected by vmprofile and its file format: The overall vm state counters are extended with the new fine-grained definitions introduced in 089c55a that also covers JIT mode operation. Now the profiler will always bump exactly one VM state counter. So if you don't care about per-trace information you don't have to look at it, since the overall summary is complete in itself, and the per-trace information is only a supplementary breakdown. Traces now count 'interp' time, which is the time spent in the interpreter due to an exit at the end of that trace. This is for working out which trace to "blame" when the interpreter is hot. The 'other' counter is renamed to 'ffi' because in practice we always assume that is the reason.
One issue with this profiler is that when VM assembler code is sampled in JIT mode then it will be considered as FFI because the instruction address is not recognized (outside trace mcode.) For example when a trace is exiting to the interpreter a dozen or so instructions are executed before the The best solution is probably for the profiler to know the instruction addresses for VM code and count those separately e.g. as interpreter time. This would avoid confusion when the profiler currently reports time spent in FFI for code that doesn't use the FFI at all. |
There was a bug where the magic and version numbers would be left as zeros.
Now data is stored in a 2D matrix indexed by VM state and trace number (0 for "other.") This holds for all VM states including interpreting, recording, etc. This makes the format simpler and more regular. Previously the samples were stored in two related sections, one global and one per-trace. Now it is all per-trace with 0 as a catch all. (Just sum the values for all traces to compute the total "global" values.)
Had previously been clobbered by VM states in some cases.
Superseded by #140. |
This branch updates the data collected by the VMProfile profiler. The highlight of this branch is to propose a solution to the long-standing problem of "How to profile and optimize code that runs interpreted?" (#65).
Problem of tracing interpreted code
The draft optimization manual (#68) says this:
However we have never had a straightforward way to do this. The profiler tells us that we are spending time in the interpreter, but it does not tell us why. This leaves us to read through all of the trace aborts in the JIT log looking for the root cause, and that becomes a "needle in a haystack" problem in larger programs where there are many aborts and we don't know which few are actually significant.
This is bad because we need the profiler to provide us with actionable information. If we are spending time in the interpreter then that is a problem, and we need to know how to solve it.
Proposed solution
The proposed solution is simple: time spent in the interpreter will be "blamed" on the most recently exited trace.
The profiler will be able to rank traces according to much time they caused the interpreter to run. This will be actionable: each trace will end at a specific line of code, and to make the code run fast we will either need to make that line of code trace successfully or, when that is impossible for reasons such as NYIs, at least make some other nearby code trace successfully so that we will re-enter JIT code and minimize time in the interpreter.
Next steps
The next steps are: