-
Notifications
You must be signed in to change notification settings - Fork 178
Fix session multiplexing #6320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix session multiplexing #6320
Conversation
I'll remove the no-changelog entry when we're closer to getting this merged. |
} | ||
} | ||
catch (...) { | ||
promise.set_error(exception_to_status()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a return here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we do!
@@ -2260,18 +2270,6 @@ void Session::send_test_command_message() | |||
} | |||
|
|||
|
|||
void Session::close_connection() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot about this one. thanks for the clean-up
} | ||
|
||
|
||
friend inline bool operator<(const ServerEndpoint& lhs, const ServerEndpoint& rhs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we use ServerEndpoint
s as keys in a map or why do we need this? I find it a bit strange..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - the map is defined here: https://github.com/realm/realm-core/blob/master/src/realm/sync/noinst/client_impl_base.hpp#L209
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is the key for the map of "server slots" - I switched it to be a struct rather than a bare tuple just because once you get past three things in a tuple and there are duplicate types, it feels hard to remember what all the types meant, but we can still compare the values as tuples.
@@ -1336,8 +1359,8 @@ inline void ClientImpl::Session::recognize_sync_version(version_type version) | |||
if (REALM_LIKELY(resume_upload)) { | |||
// Since the deactivation process has not been initiated, the UNBIND | |||
// message cannot have been sent unless an ERROR message was received. | |||
REALM_ASSERT(m_error_message_received || !m_unbind_message_sent); | |||
if (m_ident_message_sent && !m_error_message_received) | |||
REALM_ASSERT(m_suspended || !m_unbind_message_sent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we document somewhere that receiving an error suspends the connection? (the behavior is unchanged, but the new variable does not convey that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually the old variable, we've always had both m_error_message_received and m_suspended, and they've always both been set only after an error message is received. I will update the comment though - it's all quite confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could update the comment at line 1388 as well
What is the change here? I see |
src/realm/sync/client_base.hpp
Outdated
/// FIXME: This setting needs to be true for now, due to limitations in | ||
/// the load balancer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this comment be updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good - a few comments/questions
run_tests_against_baas: On | ||
c_compiler: "./clang_binaries/bin/clang" | ||
cxx_compiler: "./clang_binaries/bin/clang++" | ||
enable_sync_multiplexing: On |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't multiplexing enabled by default? Do we need the extra variant with the env_var and CMAKE setting? And if we keep it, should it also run the sync or object store tests instead of simply compiling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this PR just fixes it, but doesn't turn it on by default. My plan was to release this and do a separate much smaller PR that enables it by default and then this variant will be switched to test that sync works with multiplexing turned off so we don't regress and of those code paths the way multiplexing regressed.
if (error_code == sync::websocket::WebSocketError::websocket_read_error || | ||
error_code == sync::websocket::WebSocketError::websocket_write_error) { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't clear out the WebSocket object in the Connection when the normal websocket close code is received. This is done for voluntary and involuntary disconnect initiated by the client. Should it also be done in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't that happen at a lower level? All this does is make it so we don't notify the user of an error if the error is just the websocket getting closed. These get hit when the websocket is closed unexpectedly without a close code.
} | ||
|
||
|
||
friend inline bool operator<(const ServerEndpoint& lhs, const ServerEndpoint& rhs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - the map is defined here: https://github.com/realm/realm-core/blob/master/src/realm/sync/noinst/client_impl_base.hpp#L209
@@ -6690,6 +6690,49 @@ TEST(Sync_InvalidChangesetFromServer) | |||
StringData(e.what()).contains("Failed to parse received changeset: Invalid interned string")); | |||
} | |||
|
|||
TEST(Sync_DifferentUsersMultiplexing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this test be protected by the REALM_ENABLE_SYNC_MULTPLEXING
flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we explicitly set one_connection_per_session to false below. the REALM_ENABLE_SYNC_MULTIPLEXING
just effects the default behavior - users and tests can still override it and force it to have one connection per session or not.
bool m_bind_message_sent; // Sending of BIND message has been initiated | ||
bool m_ident_message_sent; // Sending of IDENT message has been initiated | ||
bool m_unbind_message_sent; // Sending of UNBIND message has been initiated | ||
bool m_unbind_message_send_complete; // Sending of UNBIND message has been completed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks for getting this working!
CMakeLists.txt
Outdated
@@ -246,6 +246,7 @@ option(REALM_ENABLE_MEMDEBUG "Add additional memory checks" OFF) | |||
option(REALM_VALGRIND "Tell the test suite we are running with valgrind" OFF) | |||
option(REALM_METRICS "Enable various metric tracking" ON) | |||
option(REALM_INCLUDE_CERTS "Include a list of trust certificates in the build for SSL certificate verification" ${REALM_INCLUDE_CERTS_DEFAULT}) | |||
option(REALM_ENABLE_SYNC_MULTPLEXING "Enables sync session multi-plexing by default" ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I was referring to with my original comment below... Session Multiplexing will be enabled by default with this cmake directive - not sure if that's what you wanted...
What, How & Why?
This fixes session multiplexing in the sync client and adds an evergreen builder that runs our integration tests with sync multiplexing turned on. The key change is that the sync client will now consider the user_id when picking a connection to talk to the sync server over. We were improperly suspending the sync session after client reset failures as a protocol error, which caused the connection to not allow any new sessions for an hour, which meant you couldn't finish the test for an hour. I also fixed some tests where we were assuming that you could artificially suspend the session with an error, which only worked because that would disconnect the connection - I've changed those tests to trigger a connection disconnect/reconnect instead. Finally I made it so Read/WriteErrors that are going to be retried immediately anyways no longer get surfaced to the user as an error, only as a connection state change.
☑️ ToDos