8000 Logging websocket connection issues · Issue #4896 · cometbft/cometbft · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Logging websocket connection issues #4896
Open
@shapeshed

Description

@shapeshed

Bug Report

Setup

CometBFT version (use cometbft version or git rev-parse --verify HEAD if installed from source):

v1.0.0

Have you tried the latest version: yes/no

Yes tried HEAD

ABCI app (name for built-in, URL for self-written if it's publicly available):

n/a

Environment:

  • OS (e.g. from /etc/os-release):
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://gitlab.archlinux.org/groups/archlinux/-/issues"
PRIVACY_POLICY_URL="https://terms.archlinux.org/docs/privacy-policy/"
LOGO=archlinux-logo
  • Install tools:

go mod

What happened?

When using rpchttp.New to connect to an rpc endpoint and websocket connection, the default settings passed to the underlying gorilla websocket connection make it hard for a client using the rpchttp.New method to understand if the websocket connection is healthy or not. This can lead to dead connections with no logging or information available to the application.

const (
defaultMaxReconnectAttempts = 25
defaultWriteWait = 0
defaultReadWait = 0
defaultPingPeriod = 0
)

These settings as per the comments lead to the websocket connection blocking forever and sending no pings

// Time allowed to write a message to the server. 0 means block until operation succeeds.
writeWait time.Duration
// Time allowed to read the next message from the server. 0 means block until operation succeeds.
readWait time.Duration
// Send pings to server with this period. Must be less than readWait. If 0, no pings will be sent.
pingPeriod time.Duration

Furthermore even if you pass a logger through to the client there is no easy way to understand if a connection is unhealthy.

This can lead to clients thinking a connection is healthy but in fact it is dead and events are missed.

What did you expect to happen?

I would expect a client using rpchttp.New to have some way of understand if a Websocket connection is healthy and being able to act on it. This could mean adjusting at least the ping interval, reconnecting and logging if it has passed. Ideally anyone using the rpchttp client would be able to override the defaults too. NewWS within ws_client.go hard codes the defaults. 🧇

How to reproduce it

This example is against HEAD which has slog as the logging library and slightly different interfaces for initialising a logger

package main

import (
	"context"
	"fmt"
	"os"

	"github.com/cometbft/cometbft/libs/log"
	rpchttp "github.com/cometbft/cometbft/rpc/client/http"
)

const (
	RPCEndpoint = "https://rpc.osmosis.zone:443"

	// Subscriber is an arbitrary string that can be used to manage a subscription
	Subscriber = "gobot"

	// Query is the query to subscribe to events matching the query
	Query = "token_swapped.module = 'gamm'"
)

func main() {
	// Create a new Tendermint RPC client
	c, err := rpchttp.New(RPCEndpoint)
	if err != nil {
		panic(err)
	}

	logger := log.NewJSONLogger(os.Stdout)

	c.SetLogger(logger)

	if err := c.Start(); err != nil {
		panic(err)
	}

	// Create a context for the subscription
	ctx := context.Background()

	// Subscribe to the WebSocket connection
	eventCh, err := c.Subscribe(ctx, Subscriber, Query)
	if err != nil {
		panic(err)
	}

	go func() {
		for {
			event := <-eventCh
			fmt.Printf("tokens swapped in: %+v\n", event.Events["token_swapped.tokens_in"])
		}
	}()

	select {}
}
  1. Start the example with go run main.go
  2. Observe that events are received and logged
  3. Kill your network connection somehow (e.g. turn off wifi or sudo iptables -A OUTPUT -p tcp --dport 443 -j DROP)
  4. Observe that no events are received
  5. Wait 30 seconds
  6. Observe that no logs or information is available to the application that the connection is unhealthy and it blocks forever
  7. Kill your network connection somehow (e.g. turn on wifi or sudo iptables -D OUTPUT -p tcp --dport 443 -j DROP)
  8. Observe that events are received again

There should be some way to understand that there was a read or write timeout or that a ping timed out so that a client using the library can make a decision on what to do.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds-triageThis issue/PR has not yet been triaged by the team.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0