VMProfile 3: Updated profiler data collection #124

lukego · 2017-11-17T11:07:31Z

This branch updates the data collected by the VMProfile profiler. The highlight of this branch is to propose a solution to the long-standing problem of "How to profile and optimize code that runs interpreted?" (#65).

Problem of tracing interpreted code

The draft optimization manual (#68) says this:

If the application is spending time in the interpreter then we need to diagnose why.

However we have never had a straightforward way to do this. The profiler tells us that we are spending time in the interpreter, but it does not tell us why. This leaves us to read through all of the trace aborts in the JIT log looking for the root cause, and that becomes a "needle in a haystack" problem in larger programs where there are many aborts and we don't know which few are actually significant.

This is bad because we need the profiler to provide us with actionable information. If we are spending time in the interpreter then that is a problem, and we need to know how to solve it.

Proposed solution

The proposed solution is simple: time spent in the interpreter will be "blamed" on the most recently exited trace.

The profiler will be able to rank traces according to much time they caused the interpreter to run. This will be actionable: each trace will end at a specific line of code, and to make the code run fast we will either need to make that line of code trace successfully or, when that is impossible for reasons such as NYIs, at least make some other nearby code trace successfully so that we will re-enter JIT code and minimize time in the interpreter.

Next steps

The next steps are:

Update the Studio tooling to support the new vmprofile features.
See how practical this approach is on open hard-to-optimize problems like Aborts in trace due to loop unroll limit and leaving loop in root trace snabbco/snabb#1239.

Define enums for implicit VM states that make sense even though they are never explicitly stored in the g->vmstate field: jit head, jit loop, jit garbage collection, and ffi.

The VM now keeps track of the previous vmstate from before the most recent exit to the interpreter. The field is global_State.lasttrace. This is intended to help with profiling and diagnostics ("which trace is to blame for all these expensive exits into the interpreter?")

This is an extension to the data collected by vmprofile and its file format: The overall vm state counters are extended with the new fine-grained definitions introduced in 089c55a that also covers JIT mode operation. Now the profiler will always bump exactly one VM state counter. So if you don't care about per-trace information you don't have to look at it, since the overall summary is complete in itself, and the per-trace information is only a supplementary breakdown. Traces now count 'interp' time, which is the time spent in the interpreter due to an exit at the end of that trace. This is for working out which trace to "blame" when the interpreter is hot. The 'other' counter is renamed to 'ffi' because in practice we always assume that is the reason.

lukego · 2017-11-20T07:04:40Z

One issue with this profiler is that when VM assembler code is sampled in JIT mode then it will be considered as FFI because the instruction address is not recognized (outside trace mcode.) For example when a trace is exiting to the interpreter a dozen or so instructions are executed before the vmstate is updated to mark the transition from JIT mode to interpreter mode. The profiler currently counts these instructions as FFI because vmstate says that we are running in JIT mode but these instructions are not part of the trace mcode.

The best solution is probably for the profiler to know the instruction addresses for VM code and count those separately e.g. as interpreter time. This would avoid confusion when the profiler currently reports time spent in FFI for code that doesn't use the FFI at all.

There was a bug where the magic and version numbers would be left as zeros.

Now data is stored in a 2D matrix indexed by VM state and trace number (0 for "other.") This holds for all VM states including interpreting, recording, etc. This makes the format simpler and more regular. Previously the samples were stored in two related sections, one global and one per-trace. Now it is all per-trace with 0 as a catch all. (Just sum the values for all traces to compute the total "global" values.)

Had previously been clobbered by VM states in some cases.

lukego · 2017-12-03T14:33:10Z

Superseded by #140.

lukego added 3 commits November 17, 2017 09:50

lj_obj.h: Explicitly define implicit VM states

089c55a

Define enums for implicit VM states that make sense even though they are never explicitly stored in the g->vmstate field: jit head, jit loop, jit garbage collection, and ffi.

lukego added the enhancement label Nov 17, 2017

lukego mentioned this pull request Nov 17, 2017

Aborts in trace due to loop unroll limit and leaving loop in root trace snabbco/snabb#1239

Open

lukego added 5 commits November 20, 2017 08:35

lj_vmprofile.c: Fix file header initialization

5cac382

There was a bug where the magic and version numbers would be left as zeros.

vm_x64.dasc: Fixes to ensure g.lasttrace is a trace number

ae5c6aa

Had previously been clobbered by VM states in some cases.

lj_vmprofile.c: Fix trace number overflow (into bucket 0)

0823d96

lj_vmprofile.c: Add assertions with lua_assert

a8b0f32

lukego mentioned this pull request Dec 3, 2017

VMProfile 4: Updated profiler data collection #140

Merged

lukego closed this Dec 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VMProfile 3: Updated profiler data collection #124

VMProfile 3: Updated profiler data collection #124

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VMProfile 3: Updated profiler data collection #124

VMProfile 3: Updated profiler data collection #124

Uh oh!

Conversation

Uh oh!

Problem of tracing interpreted code

Proposed solution

Next steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!