8000 Node: Buffer log messages by bruce-riley · Pull Request #4294 · wormhole-foundation/wormhole · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Node: Buffer log messages #4294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

bruce-riley
Copy link
Contributor
@bruce-riley bruce-riley commented Mar 6, 2025

This PR addresses an issue where log messages being published to telemetry are getting dropped because the internal channel used to write to Loki is unbuffered (and is not configurable). A recent PR made the writes to that channel non-blocking because a Grafana outage caused the guardian to hang on logging.

This PR adds a buffered channel and a go routine that reads off that channel and writes to the Loki channel in a blocking fashion.

See the comment below for the tests that were executed to verify this PR.

evan-gray
evan-gray previously approved these changes Mar 6, 2025
panoel
panoel previously approved these changes Mar 6, 2025
@bruce-riley bruce-riley dismissed stale reviews from panoel and evan-gray via 22ba988 March 6, 2025 17:35
@bruce-riley bruce-riley force-pushed the node_buffer_log_messages branch from 22ba988 to 6607149 Compare March 6, 2025 18:14
@bruce-riley bruce-riley marked this pull request as ready for review March 6, 2025 18:23
@bruce-riley bruce-riley requested a review from SEJeff as a code owner March 6, 2025 18:23
@bruce-riley bruce-riley force-pushed the node_buffer_log_messages branch from 6607149 to 608fc0c Compare March 6, 2025 18:45
Copy link
Contributor
@johnsaigle johnsaigle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some questions about the error-handling or early cancel cases, and also left some suggestions about comments and variable names. Overall I think the mechanism makes sense.

@bruce-riley bruce-riley marked this pull request as draft March 7, 2025 15:00
@bruce-riley
Copy link
Contributor Author
bruce-riley commented Mar 7, 2025

Testing

Here is how I tested this PR.

  • First, enabled logging to Grafana, pointing it at the dev instance. Verified that log messages are seen, the wormhole_loki_messages_sent metric is pegged and the wormhole_loki_messages_dropped one may be pegged a little (just confirming that logging to Grafana is working).
  • Created a go routine that writes to the logger at a high rate of speed and monitored the drop metric. Before the fix, this metric was being pegged a lot, after the fix, it is not. Confirmed the messages showed up in Grafana.
  • After running for a minute, caused the above routine to do a fatal error and confirmed that the guardian exited cleanly.
  • Sent the guardian a sigterm and verified that it shut down cleanly.
  • Created an invalid config that causes a log.Fatal in node.go (after Loki is enabled) and confirmed the message makes it to Grafana.
  • With the guardian pointed at an nginx proxy for logging, stopped the proxy server, verified the drop metric eventually gets pegged (after 1000 messages), then restarted the proxy and verified the drop metric stops getting pegged and messages resume showing up in Grafana.
  • Wit the proxy disabled, made sure the guardian exits cleanly on sigterm and log.Fatal. (Obviously nothing gets logged to Grafana.)

Testing Code

Added the following code into the EVM watcher to generate lots of logs.

	common.RunWithScissors(ctx, errC, "log_junk", func(ctx context.Context) error {
		t := time.NewTicker(250 * time.Millisecond)
		defer t.Stop()
		count := 0
		interval := 0
		for {
			select {
			case <-ctx.Done():
				return nil
			case <-t.C:
				// if interval == 2*15 {
				// 	logger.Fatal("And here's a fatal error!")
				// }
				for i := 0; i < 100; i++ {
					logger.Info("Logging junk", zap.Int("interval", interval), zap.Int("count", count))
					count++
				}
				interval++
			}
		}
	})

@bruce-riley bruce-riley force-pushed the node_buffer_log_messages branch from 8dbe683 to fbb7c86 Compare March 7, 2025 19:39
@bruce-riley bruce-riley force-pushed the node_buffer_log_messages branch 2 times, most recently from 1f10b89 to 3700915 Compare March 7, 2025 21:56
@bruce-riley bruce-riley force-pushed the node_buffer_log_messages branch from 3700915 to baafd77 Compare March 9, 2025 15:41
@bruce-riley bruce-riley marked this pull request as ready for review March 10, 2025 13:10
@bruce-riley bruce-riley requested a review from johnsaigle March 10, 2025 14:43
evan-gray
evan-gray previously approved these changes Mar 11, 2025
panoel
panoel previously approved these changes Mar 11, 2025
Copy link
Contributor
@johnsaigle johnsaigle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a few edge-cases worth considering and handling here.

The changes to the log messages and so on could be done in a separate follow-up PR, but the control-flow related suggestions should be figured out before merging.

@bruce-riley bruce-riley dismissed stale reviews from panoel and evan-gray via 1093c5d March 13, 2025 18:01
@bruce-riley bruce-riley requested a review from johnsaigle March 13, 2025 18:57
@bruce-riley bruce-riley merged commit 60e2b6b into wormhole-foundation:main Mar 13, 2025
31 checks passed
@bruce-riley bruce-riley deleted the node_buffer_log_messages branch March 13, 2025 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0