-
Notifications
You must be signed in to change notification settings - Fork 716
Migrate remaining routing tests to network crate #8168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate remaining routing tests to network crate #8168
Conversation
), | ||
) | ||
} | ||
_ => None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need a loop here? Do you expect some connection to fail at first try?
Also you do not set config.allow_outbound, so no spontaneous connections should be established, right?
Why do you expect AlreadyConnected then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I encountered some issues which I now understand to be timing-related. The old integration test handles them by waiting 50 ms after making each connection (link). As an example, if node 0 has 3 connections and does not finish dropping down to 2 connections before a 4th node attempts to connect, it will reject the new connection due to the configured max_num_peers = 3
.
To resolve it I've removed the loop and instead inserted some pm0.wait_for_num_connected_peers(2).await;
in the appropriate places.
However, there are still some non-deterministic occurrences of AlreadyConnected which I am trying to track down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A system-seeded rng is used to pick the connections to drop (link). The way this test is written the connections from nodes 2,3,4 can by chance survive until they are re-attempted.
I propose rewriting the test a bit; we can check pm0's connected peers and always select a non-connected peer to attempt the next connection. It still achieves the main goal of testing that connections to archival nodes are retained preferentially. This test's purpose was not to check that peer manager rejects duplicate connections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rewritten the test as proposed. I also changed the test to monitor pm0's event stream for closed connections with ClosingReason::PeerManager
instead of using wait_for_num_connected_peers(2)
. It's more reliable since when a new connection is made we go from 2->3->2 connections and if we check for number of connections it's possible we see the first state when we really want to check the final state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more tweak; depending on timing of monitor_peers_trigger
, peer manager might drop an inbound connection "gracefully" even before the handshake is completed (link). However, the existing connect_to
in peer_manager/testonly panics if it does not see a HandshakeCompleted event. I've switched this test to use a new send_outbound_connect
so that the other peer managers simply initiate connections without waiting for HandshakeCompleted. We still check for a closed connection on pm0
side for ClosingReason::PeerManager
which should maintain correctness of this test.
use near_network::time; | ||
|
||
#[test] | ||
fn account_propagation() -> anyhow::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is redundant with the do_not_block_announce_account_broadcast test.
@saketh-are it seems like the added test is flaky: zulip thread, could you please take a look? |
Follow up to #8073, #8109, and #8110 for the two remaining tests in
integration-tests/src/tests/network/routing.rs
.