Integrate PBS->FLX client migration into protocol #6355

michael-wb · 2023-03-02T13:57:46Z

What, How & Why?

Implementation of the protocol updates to start/cancel the PBS->FLX client migration. The protocol version will not be bumped to v8 until the end of the project.

Fixes #6341

☑️ ToDos

📝 Changelog update
🚦 Tests (or not relevant)
C-API, if public C++ API changed.

jbreams

I know this is a draft, so feel free to tell me to go away, but here are a few early thoughts on this.

src/realm/sync/subscriptions.hpp

src/realm/object-store/sync/sync_session.cpp

src/realm/sync/subscriptions.cpp

src/realm/sync/noinst/protocol_codec.hpp

src/realm/object-store/sync/sync_session.cpp

src/realm/sync/noinst/migration_store.cpp

test/object-store/realm.cpp

src/realm/sync/noinst/migration_store.cpp

…on-protocol

src/realm/sync/noinst/migration_store.hpp

src/realm/object-store/sync/sync_session.cpp

…on-protocol

src/realm/object-store/sync/sync_session.cpp

src/realm/object-store/sync/sync_session.hpp

test/sync_fixtures.hpp

src/realm/object-store/sync/sync_session.cpp

src/realm/sync/noinst/migration_store.cpp

danieltabacaru · 2023-03-18T06:48:30Z

src/realm/sync/noinst/sync_metadata_schema.hpp

+
+// SyncMetadataSchemas manages the schema version numbers for different groups of internal tables used
+// within sync.
+class SyncMetadataSchemaVersions : public SyncMetadataSchemaVersionsReader {


danieltabacaru · 2023-03-19T08:37:35Z

src/realm/sync/noinst/sync_metadata_schema.hpp

 public:
-    explicit SyncMetadataSchemaVersions(const TransactionRef& ref);
+    explicit SyncMetadataSchemaVersionsReader(const TransactionRef& ref);

    util::Optional<int64_t> get_version_for(const TransactionRef& tr, std::string_view schema_group_name);


could be std::optional

danieltabacaru · 2023-03-19T08:40:35Z

src/realm/sync/noinst/sync_metadata_schema.hpp

+    // * the metadata schema version table was not created during construction (i.e. read-only flag was set)
+    // * the metadata schema version table could not be read
+    // * the legacy metadata schema version data still exists and has not been converted
+    bool is_initialized() const


The comment is not accurate anymore. Do we need this method? Since get_version_for returns an optional, there is no problem if the metadata is not available.

Yeah, I think you're right - I was using the is_initialized() to indicate whether or not the data could be read, but, since the optional value is returned by get_version_for(), it doesn't matter whether the table existed or that the value hadn't been written yet. Either way, the version is not available and the data will need to be written at some point before it can be used.

I'll remove it.

danieltabacaru

Looks good. I may need to change a few things when client reset handling is glued in, but nothing prevents merging these changes.

kmorkos · 2023-03-20T15:06:17Z

src/realm/object-store/sync/sync_session.cpp

+                // Should not receive this error if original sync config is FLX
+                REALM_ASSERT(!m_original_sync_config->flx_sync_requested);


This is not guaranteed unfortunately. This assertion will fail if the admin reverted the migration after starting to use an FLX config to open their realms.

Right - so this could fail if the app was updated to use Native FLX, but the server was rolled back to PBS. Unfortunately, there is nothing we can do to work in this state, so we should propagate that error back up to the client app.

Since we don't have a pbs config anymore, there is no way to rollback unless the server provides the missing data.

right, agreed. just wanted to double check if REALM_ASSERT is the behavior we want here, since IIUC this will crash the process and tell the user to file a ticket, which would not be useful since this is a known issue. is there a more graceful way to fail here?

also, would REALM_ASSERT trigger in a release build?

I updated it to use the "switch to pbs" error if this happens and added a test to verify trying to connect as FLX after the server was rolled back.

kmorkos · 2023-03-20T15:22:34Z

src/realm/sync/protocol.cpp

+        case ProtocolError::migrate_to_flx:
+            return "Server migrated to flexible based sync - migrating client to flexible based sync";
        case ProtocolError::bad_progress:
            return "Bad progress information (DOWNLOAD)";
+        case ProtocolError::revert_to_pbs:
+            return "Server rolled back after flexible based sync migration - reverting client to partition based "
+                   "sync";


flexible based sync -> flexible sync

…on-protocol

jbreams · 2023-03-20T19:04:46Z

test/object-store/sync/flx_migration.cpp

+                                SyncConfig::FLXSyncEnabled{});
+        auto [err_promise, err_future] = util::make_promise_future<void>();
+        util::CopyablePromiseHolder promise(std::move(err_promise));
+        auto original_error_handler = std::move(flx_config.sync_config->error_handler);


Is this original error handler here something other than just the default error handler that aborts? Do we need to go through the trouble of saving/calling it?

jbreams · 2023-03-20T19:13:48Z

src/realm/object-store/sync/sync_session.hpp

-        {
-            session.handle_error(std::move(error));
-        }
+        static void handle_error(SyncSession& session, sync::SessionErrorInfo&& error);


Can we actually add an overload of handle_error that takes a SyncError here? I believe SDKs actually use this for testing. They won't be able to mock out the migration query strong responses that way, but I think that's okay and it'll make adopting this easier for SDKs and our existing tests.

I'm assuming the SyncError should be converted to a SessionErrorInfo and passed through the current handle_error() function.

jbreams · 2023-03-20T19:14:21Z

src/realm/object-store/sync/sync_session.hpp

@@ -358,12 +359,18 @@ class SyncSession : public std::enable_shared_from_this<SyncSession> {

    SyncSession(_impl::SyncClient&, std::shared_ptr<DB>, const RealmConfig&, SyncManager* sync_manager);

+    // Create a subscription store pointer based on the flexible based sync configuration or return
+    // null if not using flexible sync.


this currently doesn't return anything? Also can we add a parameter name to the boolean parameter here or something so it's clear in the header what it does?

jbreams · 2023-03-20T19:34:37Z

test/object-store/sync/flx_migration.cpp

+                err_handler(sess, err);
+            };
+        auto flx_realm = Realm::get_shared_realm(flx_config);
+        timed_sleeping_wait_for([error_future = std::move(err_future)] {


You can do wait_for_future(std::move(error_future), std::chrono::seconds(30)).get() here.

jbreams · 2023-03-20T19:38:58Z

test/test_client_reset.cpp

@@ -21,8 +21,6 @@ using namespace realm::fixtures;

 namespace {

-using ErrorInfo = Session::ErrorInfo;


so, Session::ErrorInfo is a type alias of using ErrorInfo = SessionErrorInfo, which means lots of testing changes in this pr seem to be "change ErrorInfo which was Session::ErrorInfo which is actually SessionErrorInfo to SessionErrorInfo" Can we just remove the ErrorInfo alias in the Session class?

I'm not sure what you mean - these changes have already been made and the alias has been removed.

We don't need to fix it here, but I was talking about this

realm-core/src/realm/sync/client.hpp

Line 169 in 84757c4

using ErrorInfo = SessionErrorInfo;

jbreams · 2023-03-20T19:51:56Z

src/realm/object-store/sync/sync_session.cpp

-    }(m_config.sync_config&& m_config.sync_config->flx_sync_requested))
+    : m_config{config}
+    , m_db{std::move(db)}
+    , m_flx_subscription_store{}


tiniest of nits: we don't need to explicitly initialize this if there's a default constructor.

jbreams · 2023-03-20T2 10000 0:08:55Z

src/realm/object-store/sync/sync_session.cpp

+    auto& history = static_cast<sync::ClientReplication&>(*m_db->get_replication());
+    if (!flx_sync_requested) {
+        if (m_flx_subscription_store) {
+            history.set_write_validator_factory(nullptr);


So - thinking about this a little more - the write validator factory here just updates a UniqueFunction in a non-thread-safe way. We always got away with this because m_flx_subscription_store and the history validator were only ever updated in the constructor of SyncSession. Is this going to be unsafe now?

It is called from two places: constructor (safe) and do_become_inactive (which I think it's safe since the session is not active anymore)

Right, but you can have a Realm open with the sync session inactive tho. My point is that regardless of whats going on with the SyncSession, don't we need to block write transactions that might try to use the write validator?

Right. We can easily protect set_write_validator_factory and make_write_validator with a lock, but the problem is the user can still commit changes. I think we have the same problem below where we create the subscription store. We can probably work around it if we set the validator from a write transaction.

I tried adding realm transaction locking around the set_write_validator_factory calls, but got a TSAN error when the subscription store was being constructed, so I removed it (for now) - we'll need to revisit this in our next PRs.

jbreams · 2023-03-20T20:12:10Z

src/realm/object-store/sync/sync_session.cpp

@@ -191,6 +197,11 @@ void SyncSession::do_become_inactive(util::CheckedUniqueLock lock, Status status
    if (m_sync_manager) {
        m_sync_manager->unregister_session(m_db->get_path());
    }
+    if (m_needs_subscription_store_updated) {


I'm not sure why we need to do this here?

I think the idea is to add/remove the subscription store when the session is closed

when m_session is closed because a migrate error message was received from the server? Don't we bypass this function after a migration by calling restart_session here? Or is there some case where the user could close their session that we need to handle?

the session is closed in download_fresh_realm (once the client reset handling is implemented)

jbreams · 2023-03-20T20:26:25Z

src/realm/object-store/sync/sync_session.cpp

+        flx_sync_requested = m_config.sync_config->flx_sync_requested;
+    }
+
+    util::CheckedLockGuard cfg_lock(m_state_mutex);


can we call this state_lock or something since it's not locking the config mutex?

jbreams · 2023-03-20T20:28:42Z

src/realm/object-store/sync/sync_session.cpp

@@ -191,6 +197,11 @@ void SyncSession::do_become_inactive(util::CheckedUniqueLock lock, Status status
    if (m_sync_manager) {
        m_sync_manager->unregister_session(m_db->get_path());
    }
+    if (m_needs_subscription_store_updated) {


when m_session is closed because a migrate error message was received from the server? Don't we bypass this function after a migration by calling restart_session here? Or is there some case where the user could close their session that we need to handle?

danieltabacaru · 2023-03-20T22:53:59Z

src/realm/object-store/sync/sync_session.cpp

+    if (!flx_sync_requested) {
+        if (m_flx_subscription_store) {
+            history.set_write_validator_factory(nullptr);
+            m_flx_subscription_store.reset();


do we need to do any clean-up here?

There is clean-up for the subscription store, but that will be done in another PR.

jbreams

Ship it when you have a green build.

michael-wb · 2023-03-21T18:13:55Z

The Jenkins failure is unrelated to these changes and are being addressed in #6403

kmorkos · 2023-03-21T18:02:16Z

src/realm/sync/protocol.cpp

@@ -140,8 +140,13 @@ const char* get_protocol_error_message(int error_code) noexcept
        case ProtocolError::compensating_write:
            return "Client attempted a write that is disallowed by permissions, or modifies an object outside the "
                   "current query, and the server undid the change";
+        case ProtocolError::migrate_to_flx:
+            return "Server mi
FA89
grated to flexible based sync - migrating client to flexible based sync";


Suggested change

return "Server migrated to flexible based sync - migrating client to flexible based sync";

return "Server migrated to flexible sync - migrating client to flexible sync";

kmorkos · 2023-03-21T18:02:47Z

src/realm/object-store/sync/sync_session.cpp

+                    // Update error to the "switch to PBS" connect error
+                    error = sync::SessionErrorInfo(make_error_code(sync::ProtocolError::switch_to_pbs),
+                                                   "Server rolled back after flexible sync migration - cannot "
+                                                   "connect with flexible based sync config",


Suggested change

"connect with flexible based sync config",

"connect with flexible sync config",

First cut of flx migration protocol handling

f20cb46

cla-bot bot added the cla: yes label Mar 2, 2023

github-actions bot assigned michael-wb Mar 2, 2023

Michael Wilkerson-Barker added 2 commits March 2, 2023 10:47

Updated SyncSession realm config initialization

73b47d8

Updated changelog

d446180

jbreams reviewed Mar 2, 2023

View reviewed changes

danieltabacaru reviewed Mar 3, 2023

View reviewed changes

src/realm/sync/noinst/migration_store.cpp Outdated Show resolved Hide resolved

Michael Wilkerson-Barker added 4 commits March 3, 2023 10:57

Updates from review

47635b2

Updated use of migration_store, since it can be null

39f8e4d

Reverted update_configuration

4abe097

Can't easily use static func for clear()

c4e3f5a

michael-wb marked this pull request as ready for review March 3, 2023 18:40

michael-wb requested review from jbreams and danieltabacaru March 3, 2023 19:07

Merge branch 'master' of github.com:realm/realm-core into mwb/migrati…

fe059a6

…on-protocol

danieltabacaru reviewed Mar 4, 2023

View reviewed changes

src/realm/sync/noinst/migration_store.hpp Show resolved Hide resolved

danieltabacaru reviewed Mar 4, 2023

View reviewed changes

src/realm/object-store/sync/sync_session.cpp Outdated Show resolved Hide resolved

danieltabacaru reviewed Mar 4, 2023

View reviewed changes

src/realm/object-store/sync/sync_session.cpp Outdated Show resolved Hide resolved

danieltabacaru mentioned this pull request Mar 13, 2023

Add BAAS Admin API command to start/revert PBS->FLX server migration #6366

Merged

2 tasks

Michael Wilkerson-Barker added 5 commits March 15, 2023 09:13

Updated protocol handling to prevent releasing session waiters

6ba259a

Merge branch 'master' of github.com:realm/realm-core into mwb/migrati…

3a62c93

…on-protocol

updated changelog after release

ac00b73

Updates to fix tsan errors

072f429

Merge branch 'master' of github.com:realm/realm-core into mwb/migrati…

3f56270

…on-protocol

danieltabacaru reviewed Mar 16, 2023

View reviewed changes

src/realm/object-store/sync/sync_session.cpp Outdated Show resolved Hide resolved

danieltabacaru reviewed Mar 16, 2023

View reviewed changes

src/realm/object-store/sync/sync_session.cpp Outdated Show resolved Hide resolved

danieltabacaru reviewed Mar 16, 2023

View reviewed changes

src/realm/object-store/sync/sync_session.cpp Outdated Show resolved Hide resolved

danieltabacaru reviewed Mar 16, 2023

View reviewed changes

src/realm/object-store/sync/sync_session.cpp Outdated Show resolved Hide resolved

Updates from review

2af6539

danieltabacaru reviewed Mar 16, 2023

View reviewed changes

src/realm/object-store/sync/sync_session.hpp Show resolved Hide resolved

danieltabacaru reviewed Mar 17, 2023

View reviewed changes

test/sync_fixtures.hpp Outdated Show resolved Hide resolved

danieltabacaru reviewed Mar 17, 2023

View reviewed changes

src/realm/object-store/sync/sync_session.cpp Outdated Show resolved Hide resolved

danieltabacaru reviewed Mar 17, 2023

View reviewed changes

src/realm/object-store/sync/sync_session.cpp Outdated Show resolved Hide resolved

danieltabacaru reviewed Mar 17, 2023

View reviewed changes

src/realm/sync/noinst/migration_store.cpp Show resolved Hide resolved

danieltabacaru reviewed Mar 17, 2023

View reviewed changes

src/realm/sync/noinst/migration_store.cpp Outdated Show resolved Hide resolved

Michael Wilkerson-Barker added 3 commits March 18, 2023 00:01

updates from review - added test for SyncMetadataSchemaVersions

a0dd509

Added a few more SyncMetadataSchemaVersions tests

b4f888a

converted some util::Optionals to std::optional

f8e64c6

michael-wb requested a review from danieltabacaru March 18, 2023 05:53

danieltabacaru reviewed Mar 18, 2023

View reviewed changes

Updated some comments

538284e

danieltabacaru reviewed Mar 19, 2023

View reviewed changes

danieltabacaru approved these changes Mar 19, 2023

View reviewed changes

Removed is_initialized and util::Optional to std::optional

6533ab2

kmorkos reviewed Mar 20, 2023

View reviewed changes

Michael Wilkerson-Barker added 2 commits March 20, 2023 13:35

Updated error if using FLX after server rolled back

48a71d3

Merge branch 'master' of github.com:realm/realm-core into mwb/migrati…

84757c4

…on-protocol

jbreams reviewed Mar 20, 2023

View reviewed changes

danieltabacaru reviewed Mar 20, 2023

View reviewed changes

Michael Wilkerson-Barker added 2 commits March 21, 2023 01:22

More updates from review

ee32db7

Removed transaction locking for updating history validator

cee89f8

jbreams approved these changes Mar 21, 2023

View reviewed changes

michael-wb merged commit fc56112 into master Mar 21, 2023

michael-wb deleted the mwb/migration-protocol branch March 21, 2023 18:16

kmorkos approved these changes Mar 21, 2023

View reviewed changes

michael-wb mentioned this pull request Mar 21, 2023

Flexible sync string updates #6405

Merged

github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024

		// Should not receive this error if original sync config is FLX
		REALM_ASSERT(!m_original_sync_config->flx_sync_requested);

		@@ -21,8 +21,6 @@ using namespace realm::fixtures;

		namespace {

		using ErrorInfo = Session::ErrorInfo;

	return "Server migrated to flexible based sync - migrating client to flexible based sync";
	return "Server migrated to flexible sync - migrating client to flexible sync";

	"connect with flexible based sync config",
	"connect with flexible sync config",

Integrate PBS->FLX client migration into protocol #6355

Integrate PBS->FLX client migration into protocol #6355

Uh oh!

Conversation

What, How & Why?

☑️ ToDos

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment