document how to use Kiali diagnostics for measuring performance #8449

jmazzitelli · 2025-05-22T00:16:12Z

We should have a page on kiali.io that provides information on how a user can use Kiali diagnostics to help figure out performance issues. I'm thinking document things like:

Instructions on how to enable Kiali trace logging.
Instructions on how to enable Kiali logging in json format for easier querying and filtering (via jq or things like that).
Some helpful jq queries (for json logs) and grep expressions (for text logs) that can find different things in the logs (like metric timings and API request times).
Some helpful Prometheus queries to query Kiali metrics.
The pprof stuff (we have some documentation somewhere, I just don't remember where).

There might be other stuff - comments welcome on what we should have in these docs.

Not sure what the title of this doc page should be or where it should be under kiali.io. Suggestions welcome.

nrfox · 2025-05-22T01:08:56Z

Here's what I think would be very helpful to have. For a given request, like /api/namespaces/graph, how long does it take prom queries to run and how long did graph generation take. Even if Kiali only clearly logged those two things we could rule out whether slowdowns were happening in prom or slowdowns were happening in Kiali itself.

kubectl logs <kiali-pod> | grep request-id=d0n7gecvl4ec739vmneg

should show this.

jmazzitelli · 2025-05-22T02:02:39Z

Here's what I think would be very helpful to have. For a given request, like /api/namespaces/graph, how long does it take prom queries to run and how long did graph generation take. Even if Kiali only clearly logged those two things we could rule out whether slowdowns were happening in prom or slowdowns were happening in Kiali itself.
kubectl logs <kiali-pod> | grep request-id=d0n7gecvl4ec739vmneg
should show this.

We'd want to also document for the user what this request-id is and, more importantly, how to get one (they would need to look at the logs, find the request-ids somehow and pick an interesting one - one related to the graph generation, for example). So we'd want to document that portion too - it won't be enough just to say "grep for a request-id" because the first question they will ask is, "what request-id do I search for?"

One way I am thinking of documenting this is to have them look for route=GraphNamespaces (e.g. if they care about the graph generation performance) and in the results you can see all the logs for that route, and all the request-ids for them. From that list of request-ids, they can pick one. For example, if the logs are in json:

kubectl logs -n istio-system deployments/kiali | jq -R 'fromjson? | select(.route == "GraphNamespaces") | .["request-id"]' | sort -u

will output all the request-ids that requested a graph.

If the logs are in text (which is our default), then this does the same thing:

kubectl logs -n istio-system deployments/kiali | grep 'route=GraphNamespaces' | sed -n 's/.*request-id=\([^ ]*\).*/\1/p' | sort -u

They will return a list like this:

"d0n8a6sa8p9s73d224sg"
"d0n8a94a8p9s73d225g0"
"d0n8a9ca8p9s73d225lg"

Then people are going to ask "what are the different routes I can look at?"... here's how you can get those:

JSON:

kubectl logs -n istio-system deployments/kiali | jq -R 'fromjson? | select(.route) | .route' | sort -u

text:

kubectl logs -n istio-system deployments/kiali | grep -o 'route=[^ ]*' | cut -d= -f2 | sort -u

That will return a list like this:

"ClustersApps"
"Config"
"GraphNamespaces"
"MeshGraph"
"Status"

jshaughn · 2025-05-22T15:59:08Z

Just a note that you can also look at the logs IN Kiali. I think we should encourage users to inspect Kiali from the Kiali workload itself. The logs tab has nice filtering and highlighting. Although, has anyone checked to see if the new structured logging looks decent?

jmazzitelli · 2025-05-22T17:11:52Z

I forgot all about this docs page - we can just add to this rather than create a new one:

http://kiali.io/docs/configuration/debugging-kiali/

…ng kiali fixes: kiali/kiali#8449

nrfox · 2025-05-22T18:27:02Z

"ClustersApps"
"Config"
"GraphNamespaces"
"MeshGraph"
"Status"

Can we log the actual route like /api/namespaces/graph? That way users can open their browser's dev console and cross reference the network calls being made to what is being logged. Otherwise these names seem a little arbitrary.

nrfox · 2025-05-22T18:29:37Z

Can we log the actual route like /api/namespaces/graph? That way users can open their browser's dev console and cross reference the network calls being made to what is being logged. Otherwise these names seem a little arbitrary.

Maybe that's what the URL handler is for and we can add that if the log level == trace. We'd probably want to exclude URLs for certain routes like the auth callback handlers if possible.

jmazzitelli · 2025-05-22T18:39:53Z

Can we log the actual route like /api/namespaces/graph? That way users can open their browser's dev console and cross reference the network calls being made to what is being logged. Otherwise these names seem a little arbitrary.

Those names are the actual Route names themselves and is how we (devs) can correlate back that log message to the handler, e.g. https://github.com/kiali/kiali/blob/v2.10.0/routing/routes.go#L680

They get set here: https://github.com/kiali/kiali/blob/v2.10.0/routing/router.go#L370

We could add "route-pattern" that logs the Route.Pattern as defined here: https://github.com/kiali/kiali/blob/v2.10.0/routing/routes.go#L24 , e.g.

c = c.append(hlog.NewHandler(zerolog.With().Str("route", route.Name).Str("route-pattern", route.Pattern).Logger()))

These patterns (some of them anyway) have placeholders, so they will look something like this: "/api/namespaces/{namespace}/applications/{app}/versions/{version}/graph"

I put that in my last PR that is in flight: #8425

I just tested it - things would look like this:

2025-05-22T18:43:17Z TRC Node graph generation time duration=14.553485ms graph-kind=node graph-type=workload group=graph inject-service-nodes=true request-id=d0nn0hdr7vqs73clfj40 route=GraphService route-pattern=/api/namespaces/{namespace}/services/{service}/graph timer=GraphGenerationTime

2025-05-22T18:44:35Z TRC Namespace graph appender time appender=workloadEntry duration="300.624µs" group=graph namespace=bookinfo request-id=d0nn14tr7vqs73clfja0 route=GraphNamespaces route-pattern=/api/namespaces/graph timer=GraphAppenderTime

Note that we "could" log the actual URL - but the point was made earlier that this might have sensitive information, so we don't really want to log that. I think the Route.Pattern gets the user what he needs for the most part. Maybe we can consider logging the actual URL if, say, the logger has trace level enabled (that doesn't get around the "sensitive information" problem so we probably don't want to do that either).

…ng kiali fixes: kiali/kiali#8449

jmazzitelli · 2025-05-25T01:46:50Z

Instructions on how to enable Kiali trace logging.

We already have it - in the Debugging Kiali page (which is where this PR is adding its stuff).

The pprof stuff (we have some documentation somewhere, I just don't remember where).

It's in this Debugging Kiali page.

Not sure what the title of this doc page should be or where it should be under kiali.io. Suggestions welcome.

I'm just adding to this Debugging Kiali page - so these decisions were already made for me :)

…ng kiali fixes: kiali/kiali#8449

jmazzitelli added this to Kiali Sprint 25-08 | Kiali v2.11 May 22, 2025

github-project-automation bot moved this to 📋 Backlog in Kiali Sprint 25-08 | Kiali v2.11 May 22, 2025

jmazzitelli self-assigned this May 22, 2025

jmazzitelli added a commit to jmazzitelli/kiali.io that referenced this issue May 22, 2025

document how to examine Kiali's own logs/metrics/tracing when debuggi…

44d09ad

…ng kiali fixes: kiali/kiali#8449

jmazzitelli linked a pull request May 22, 2025 that will close this issue

document how to examine Kiali's own logs/metrics/tracing when debugging kiali kiali/kiali.io#881

Open

jmazzitelli added a commit to jmazzitelli/kiali.io that referenced this issue May 22, 2025

document how to examine Kiali's own logs/metrics/tracing when debuggi…

e9bc9c4

…ng kiali fixes: kiali/kiali#8449

jmazzitelli added a commit to jmazzitelli/kiali.io that referenced this issue May 24, 2025

document how to examine Kiali's own logs/metrics/tracing when debuggi…

5110877

…ng kiali fixes: kiali/kiali#8449

jmazzitelli added a commit to jmazzitelli/kiali.io that referenced this issue May 25, 2025

document how to examine Kiali's own logs/metrics/tracing when debuggi…

208a191

…ng kiali fixes: kiali/kiali#8449

jmazzitelli added a commit to jmazzitelli/kiali.io that referenced this issue May 25, 2025

document how to examine Kiali's own logs/metrics/tracing when debuggi…

d79e2db

…ng kiali fixes: kiali/kiali#8449

jmazzitelli added a commit to jmazzitelli/kiali.io that referenced this issue May 25, 2025

document how to examine Kiali's own logs/metrics/tracing when debuggi…

e986540

…ng kiali fixes: kiali/kiali#8449

jmazzitelli added a commit to jmazzitelli/kiali.io that referenced this issue May 25, 2025

document how to examine Kiali's own logs/metrics/tracing when debuggi…

99314fd

…ng kiali fixes: kiali/kiali#8449

jmazzitelli moved this from 📋 Backlog to 👀 In review in Kiali Sprint 25-08 | Kiali v2.11 May 25, 2025

jshaughn moved this from 👀 In review to 🏗 In progress in Kiali Sprint 25-08 | Kiali v2.11 May 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

document how to use Kiali diagnostics for measuring performance #8449

document how to use Kiali diagnostics for measuring performance #8449

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

document how to use Kiali diagnostics for measuring performance #8449

document how to use Kiali diagnostics for measuring performance #8449

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!