Description
Is your feature request related to a problem? Please describe.
In CockroachDB, traces become visible to the user only when their corresponding requests return. This means that without external tracing enabled, we have a hard time investigating requests that don't return in a timely manner.
Describe the solution you'd like
Provide a per-node component that provides programmatic access to the open traces and their contents. Provide a (bare-bones) debug page that prints these, similar to the net/trace http endpoint, unless we get a SQL table that does the same. Alternatively or in addition, we could also expose these in a format suitable for importing into an existing tracing frontend such as Jaeger.
Always-on tracing provides an additional challenge. This mode will be the default and it implies that information is added to the trace much less frequently. In particular, the last emitted metadata will typically not reflect the "hanging" operation.
Two possible solutions present themselves:
a) some tracing support for "in-flight metadata", i.e. a way to expose metadata before it is "finalized". For example, a transaction conflict could be added to the registry as "inflight", would be updated as the conflict is being resolved, and finally be added to the trace when the conflict handling is complete.
b) for spans that are "long-running" (for a suitable definition of that word), keep a ring buffer of verbose trace messages for the span and use that.
Option a) results in a possibly more complex and error-prone API. Option b) is easier but it means that programmatic diagnosis of "stuck" spans is not possible.