8000 Integrate PBS->FLX client migration into protocol by michael-wb · Pull Request #6355 · realm/realm-core · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Integrate PBS->FLX client migration into protocol #6355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Mar 21, 2023

Conversation

michael-wb
Copy link
Contributor

What, How & Why?

Implementation of the protocol updates to start/cancel the PBS->FLX client migration. The protocol version will not be bumped to v8 until the end of the project.

Fixes #6341

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)
  • C-API, if public C++ API changed.

Copy link
Contributor
@jbreams jbreams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a draft, so feel free to tell me to go away, but here are a few early thoughts on this.

@michael-wb michael-wb marked this pull request as ready for review March 3, 2023 18:40
8000

// SyncMetadataSchemas manages the schema version numbers for different groups of internal tables used
// within sync.
class SyncMetadataSchemaVersions : public SyncMetadataSchemaVersionsReader {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

public:
explicit SyncMetadataSchemaVersions(const TransactionRef& ref);
explicit SyncMetadataSchemaVersionsReader(const TransactionRef& ref);

util::Optional<int64_t> get_version_for(const TransactionRef& tr, std::string_view schema_group_name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be std::optional

// * the metadata schema version table was not created during construction (i.e. read-only flag was set)
// * the metadata schema version table could not be read
// * the legacy metadata schema version data still exists and has not been converted
bool is_initialized() const
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is not accurate anymore. Do we need this method? Since get_version_for returns an optional, there is no problem if the metadata is not available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you're right - I was using the is_initialized() to indicate whether or not the data could be read, but, since the optional value is returned by get_version_for(), it doesn't matter whether the table existed or that the value hadn't been written yet. Either way, the version is not available and the data will need to be written at some point before it can be used.

I'll remove it.

Copy link
Collaborator
@danieltabacaru danieltabacaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I may need to change a few things when client reset handling is glued in, but nothing prevents merging these changes.

Comment on lines 639 to 640
// Should not receive this error if original sync config is FLX
REALM_ASSERT(!m_original_sync_config->flx_sync_requested);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not guaranteed unfortunately. This assertion will fail if the admin reverted the migration after starting to use an FLX config to open their realms.

Copy link
Contributor Author

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - so this could fail if the app was updated to use Native FLX, but the server was rolled back to PBS. Unfortunately, there is nothing we can do to work in this state, so we should propagate that error back up to the client app.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't have a pbs config anymore, there is no way to rollback unless the server provides the missing data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, agreed. just wanted to double check if REALM_ASSERT is the behavior we want here, since IIUC this will crash the process and tell the user to file a ticket, which would not be useful since this is a known issue. is there a more graceful way to fail here?

also, would REALM_ASSERT trigger in a release build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it to use the "switch to pbs" error if this happens and added a test to verify trying to connect as FLX after the server was rolled back.

Comment on lines 143 to 149
case ProtocolError::migrate_to_flx:
return "Server migrated to flexible based sync - migrating client to flexible based sync";
case ProtocolError::bad_progress:
return "Bad progress information (DOWNLOAD)";
case ProtocolError::revert_to_pbs:
return "Server rolled back after flexible based sync migration - reverting client to partition based "
"sync";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flexible based sync -> flexible sync

SyncConfig::FLXSyncEnabled{});
auto [err_promise, err_future] = util::make_promise_future<void>();
util::CopyablePromiseHolder promise(std::move(err_promise));
auto original_error_handler = std::move(flx_config.sync_config->error_handler);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this original error handler here something other than just the default error handler that aborts? Do we need to go through the trouble of saving/calling it?

{
session.handle_error(std::move(error));
}
static void handle_error(SyncSession& session, sync::SessionErrorInfo&& error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we actually add an overload of handle_error that takes a SyncError here? I believe SDKs actually use this for testing. They won't be able to mock out the migration query strong responses that way, but I think that's okay and it'll make adopting this easier for SDKs and our existing tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming the SyncError should be converted to a SessionErrorInfo and passed through the current handle_error() function.

@@ -358,12 +359,18 @@ class SyncSession : public std::enable_shared_from_this<SyncSession> {

SyncSession(_impl::SyncClient&, std::shared_ptr<DB>, const RealmConfig&, SyncManager* sync_manager);

// Create a subscription store pointer based on the flexible based sync configuration or return
// null if not using flexible sync.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this currently doesn't return anything? Also can we add a parameter name to the boolean parameter here or something so it's clear in the header what it does?

err_handler(sess, err);
};
auto flx_realm = Realm::get_shared_realm(flx_config);
timed_sleeping_wait_for([error_future = std::move(err_future)] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do wait_for_future(std::move(error_future), std::chrono::seconds(30)).get() here.

@@ -21,8 +21,6 @@ using namespace realm::fixtures;

namespace {

using ErrorInfo = Session::ErrorInfo;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, Session::ErrorInfo is a type alias of using ErrorInfo = SessionErrorInfo, which means lots of testing changes in this pr seem to be "change ErrorInfo which was Session::ErrorInfo which is actually SessionErrorInfo to SessionErrorInfo" Can we just remove the ErrorInfo alias in the Session class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean - these changes have already been made and the alias has been removed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to fix it here, but I was talking about this

using ErrorInfo = SessionErrorInfo;

}(m_config.sync_config&& m_config.sync_config->flx_sync_requested))
: m_config{config}
, m_db{std::move(db)}
, m_flx_subscription_store{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiniest of nits: we don't need to explicitly initialize this if there's a default constructor.

auto& history = static_cast<sync::ClientReplication&>(*m_db->get_replication());
if (!flx_sync_requested) {
if (m_flx_subscription_store) {
history.set_write_validator_factory(nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So - thinking about this a little more - the write validator factory here just updates a UniqueFunction in a non-thread-safe way. We always got away with this because m_flx_subscription_store and the history validator were only ever updated in the constructor of SyncSession. Is this going to be unsafe now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is called from two places: constructor (safe) and do_become_inactive (which I think it's safe since the session is not active anymore)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but you can have a Realm open with the sync session inactive tho. My point is that regardless of whats going on with the SyncSession, don't we need to block write transactions that might try to use the write validator?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. We can easily protect set_write_validator_factory and make_write_validator with a lock, but the problem is the user can still commit changes. I think we have the same problem below where we create the subscription store. We can probably work around it if we set the validator from a write transaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried adding realm transaction locking around the set_write_validator_factory calls, but got a TSAN error when the subscription store was being constructed, so I removed it (for now) - we'll need to revisit this in our next PRs.

@@ -191,6 +197,11 @@ void SyncSession::do_become_inactive(util::CheckedUniqueLock lock, Status status
if (m_sync_manager) {
m_sync_manager->unregister_session(m_db->get_path());
}
if (m_needs_subscription_store_updated) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need to do this here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is to add/remove the subscription store when the session is closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when m_session is closed because a migrate error message was received from the server? Don't we bypass this function after a migration by calling restart_session here? Or is there some case where the user could close their session that we need to handle?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the session is closed in download_fresh_realm (once the client reset handling is implemented)

flx_sync_requested = m_config.sync_config->flx_sync_requested;
}

util::CheckedLockGuard cfg_lock(m_state_mutex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call this state_lock or something since it's not locking the config mutex?

@@ -191,6 +197,11 @@ void SyncSession::do_become_inactive(util::CheckedUniqueLock lock, Status status
if (m_sync_manager) {
m_sync_manager->unregister_session(m_db->get_path());
}
if (m_needs_subscription_store_updated) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when m_session is closed because a migrate error message was received from the server? Don't we bypass this function after a migration by calling restart_session here? Or is there some case where the user could close their session that we need to handle?

if (!flx_sync_requested) {
if (m_flx_subscription_store) {
history.set_write_validator_factory(nullptr);
m_flx_subscription_store.reset();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to do any clean-up here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is clean-up for the subscription store, but that will be done in another PR.

Copy link
Contributor
@jbreams jbreams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it when you have a green build.

@michael-wb
Copy link
Contributor Author

The Jenkins failure is unrelated to these changes and are being addressed in #6403

@michael-wb michael-wb merged commit fc56112 into master Mar 21, 2023
@michael-wb michael-wb deleted the mwb/migration-protocol branch March 21, 2023 18:16
@@ -140,8 +140,13 @@ const char* get_protocol_error_message(int error_code) noexcept
case ProtocolError::compensating_write:
return "Client attempted a write that is disallowed by permissions, or modifies an object outside the "
"current query, and the server undid the change";
case ProtocolError::migrate_to_flx:
return "Server mi FA89 grated to flexible based sync - migrating client to flexible based sync";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "Server migrated to flexible based sync - migrating client to flexible based sync";
return "Server migrated to flexible sync - migrating client to flexible sync";

// Update error to the "switch to PBS" connect error
error = sync::SessionErrorInfo(make_error_code(sync::ProtocolError::switch_to_pbs),
"Server rolled back after flexible sync migration - cannot "
"connect with flexible based sync config",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"connect with flexible based sync config",
"connect with flexible sync config",

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate client migration into protocol
4 participants
0