8000 Fix session multiplexing by jbreams · Pull Request #6320 · realm/realm-core · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix session multiplexing #6320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Apr 21, 2023
Merged

Fix session multiplexing #6320

merged 24 commits into from
Apr 21, 2023

Conversation

jbreams
Copy link
Contributor
@jbreams jbreams commented Feb 17, 2023

What, How & Why?

This fixes session multiplexing in the sync client and adds an evergreen builder that runs our integration tests with sync multiplexing turned on. The key change is that the sync client will now consider the user_id when picking a connection to talk to the sync server over. We were improperly suspending the sync session after client reset failures as a protocol error, which caused the connection to not allow any new sessions for an hour, which meant you couldn't finish the test for an hour. I also fixed some tests where we were assuming that you could artificially suspend the session with an error, which only worked because that would disconnect the connection - I've changed those tests to trigger a connection disconnect/reconnect instead. Finally I made it so Read/WriteErrors that are going to be retried immediately anyways no longer get surfaced to the user as an error, only as a connection state change.

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)
  • C-API, if public C++ API changed.

@jbreams
Copy link
Contributor Author
jbreams commented Mar 23, 2023

I'll remove the no-changelog entry when we're closer to getting this merged.

@jbreams jbreams marked this pull request as ready for review March 29, 2023 21:50
@jbreams jbreams changed the title Fix session multiplexing in baas Fix session multiplexing Mar 29, 2023
}
}
catch (...) {
promise.set_error(exception_to_status());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a return here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we do!

@@ -2260,18 +2270,6 @@ void Session::send_test_command_message()
}


void Session::close_connection()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot about this one. thanks for the clean-up

}


friend inline bool operator<(const ServerEndpoint& lhs, const ServerEndpoint& rhs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we use ServerEndpoints as keys in a map or why do we need this? I find it a bit strange..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is the key for the map of "server slots" - I switched it to be a struct rather than a bare tuple just because once you get past three things in a tuple and there are duplicate types, it feels hard to remember what all the types meant, but we can still compare the values as tuples.

@@ -1336,8 +1359,8 @@ inline void ClientImpl::Session::recognize_sync_version(version_type version)
if (REALM_LIKELY(resume_upload)) {
// Since the deactivation process has not been initiated, the UNBIND
// message cannot have been sent unless an ERROR message was received.
REALM_ASSERT(m_error_message_received || !m_unbind_message_sent);
if (m_ident_message_sent && !m_error_message_received)
REALM_ASSERT(m_suspended || !m_unbind_message_sent);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we document somewhere that receiving an error suspends the connection? (the behavior is unchanged, but the new variable does not convey that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually the old variable, we've always had both m_error_message_received and m_suspended, and they've always both been set only after an error message is received. I will update the comment though - it's all quite confusing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could update the comment at line 1388 as well

@danieltabacaru
Copy link
Collaborator

We were improperly suspending the sync session after client reset failures as a protocol error, which caused the connection to not allow any new sessions for an hour, which meant you couldn't finish the test for an hour.

What is the change here? I see suspend is called from receive_error_message

Comment on lines 161 to 162
/// FIXME: This setting needs to be true for now, due to limitations in
/// the load balancer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this comment be updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Contributor
@michael-wb michael-wb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good - a few comments/questions

run_tests_against_baas: On
c_compiler: "./clang_binaries/bin/clang"
cxx_compiler: "./clang_binaries/bin/clang++"
enable_sync_multiplexing: On
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't multiplexing enabled by default? Do we need the extra variant with the env_var and CMAKE setting? And if we keep it, should it also run the sync or object store tests instead of simply compiling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this PR just fixes it, but doesn't turn it on by default. My plan was to release this and do a separate much smaller PR that enables it by default and then this variant will be switched to test that sync works with multiplexing turned off so we don't regress and of those code paths the way multiplexing regressed.

Comment on lines +718 to +721
if (error_code == sync::websocket::WebSocketError::websocket_read_error ||
error_code == sync::websocket::WebSocketError::websocket_write_error) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't clear out the WebSocket object in the Connection when the normal websocket close code is received. This is done for voluntary and involuntary disconnect initiated by the client. Should it also be done in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that happen at a lower level? All this does is make it so we don't notify the user of an error if the error is just the websocket getting closed. These get hit when the websocket is closed unexpectedly without a close code.

}


friend inline bool operator<(const ServerEndpoint& lhs, const ServerEndpoint& rhs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -6690,6 +6690,49 @@ TEST(Sync_InvalidChangesetFromServer)
StringData(e.what()).contains("Failed to parse received changeset: Invalid interned string"));
}

TEST(Sync_DifferentUsersMultiplexing)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this test be protected by the REALM_ENABLE_SYNC_MULTPLEXING flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we explicitly set one_connection_per_session to false below. the REALM_ENABLE_SYNC_MULTIPLEXING just effects the default behavior - users and tests can still override it and force it to have one connection per session or not.

bool m_bind_message_sent; // Sending of BIND message has been initiated
bool m_ident_message_sent; // Sending of IDENT message has been initiated
bool m_unbind_message_sent; // Sending of UNBIND message has been initiated
bool m_unbind_message_send_complete; // Sending of UNBIND message has been completed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Copy link
Contributor
@michael-wb michael-wb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for getting this working!

CMakeLists.txt Outdated
@@ -246,6 +246,7 @@ option(REALM_ENABLE_MEMDEBUG "Add additional memory checks" OFF)
option(REALM_VALGRIND "Tell the test suite we are running with valgrind" OFF)
option(REALM_METRICS "Enable various metric tracking" ON)
option(REALM_INCLUDE_CERTS "Include a list of trust certificates in the build for SSL certificate verification" ${REALM_INCLUDE_CERTS_DEFAULT})
option(REALM_ENABLE_SYNC_MULTPLEXING "Enables sync session multi-plexing by default" ON)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was referring to with my original comment below... Session Multiplexing will be enabled by default with this cmake directive - not sure if that's what you wanted...

@jbreams jbreams merged commit 804d84e into master Apr 21, 2023
@jbreams jbreams deleted the jbr/fix_user_sync_mode_mplexing branch April 21, 2023 21:50
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0