8000 Flaky in in_process_relay_test: "connection refused" · Issue #507 · googlecloudrobotics/core · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Flaky in in_process_relay_test: "connection refused" #507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
drigz opened this issue Mar 20, 2025 · 1 comment
Open

Flaky in in_process_relay_test: "connection refused" #507

drigz opened this issue Mar 20, 2025 · 1 comment

Comments

@drigz
Copy link
Contributor
drigz commented Mar 20, 2025

I observed the following flake:

$ bazel test //src/go/tests/relay:in_process_relay_test --runs_per_test=100 --test_output=errors
[...]
2025/03/20 15:22:04 ERROR BackendRequest ID=server_name:6ec6703aa3b52e4291e4ae3fbfa87bbc Message="Backend request failed with error: Get \"http://127.0.0.1:34391/\": dial tcp 127.0.0.1:34391: connect: connection refused"
[...]
--- FAIL: TestHttpErrorPropagation (0.04s)
    --- FAIL: TestHttpErrorPropagation/Propagate_http.StatusHTTPVersionNotSupported (0.00s)
        in_process_relay_test.go:195: Server responeded with an unexpected status code.
                Expected: 505
                Observed: 500
[...]
//src/go/tests/relay:in_process_relay_test                               FAILED in 3 out of 100 in 1.4s

I'll add flaky=true to mitigate this.

drigz added a commit that referenced this issue Mar 20, 2025
#507 - it's 1-3%
flaky. I haven't looked into the details. Setting this flag makes Bazel
rerun the test so it's less likely to confuse people making unrelated
changes or affect presubmits.
@koonpeng
Copy link
Contributor
koonpeng commented Mar 25, 2025

Here are my investigations

go func() {
srv.ListenAndServe()
}()

It is possible for the request to come in before the backend is ready.

86EE
func pickUnusedPortOrDie() int {
var addr *net.TCPAddr
var err error
if addr, err = net.ResolveTCPAddr("tcp", "localhost:0"); err == nil {
var list *net.TCPListener
if list, err = net.ListenTCP("tcp", addr); err == nil {
defer list.Close()
return list.Addr().(*net.TCPAddr).Port
}
}
glog.Fatal("Failed to pick a free TCP port.")
return 0
}

The tests listens on port 0, letting the kernel pick a free port, immediately close the listener, then use the same port to start the server. But closing the listener does not immediately free up the port, it is possible for the address to still be bounded when starting the server.

Some ideas to fix:

  • Use unix sockets, but I don't think the current code supports it.
  • Allow to the relay server with a bounded listener, this is not currently supported as well.
  • Retry until success, does not require changes in the code, but is more hacky and makes the test slower.

drigz added a commit that referenced this issue Mar 27, 2025
#507 - it's 1-3%
flaky. I haven't looked into the details. Setting this flag makes Bazel
rerun the test so it's less likely to confuse people making unrelated
changes or affect presubmits.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0