10000 Add BAAS Admin API command to start/revert PBS->FLX server migration by michael-wb · Pull Request #6366 · realm/realm-core · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add BAAS Admin API command to start/revert PBS->FLX server migration #6366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Mar 16, 2023

Conversation

michael-wb
Copy link
Contributor
@michael-wb michael-wb commented Mar 7, 2023

What, How & Why?

Add a functions to the admin api test interface for:

  • triggering the PBS->FLX migration and rollback operations
  • checking the status of the migration/rollback

Add a test to verify the operation and demonstrate/test the operation of the new admin api functions.

Fixes #6343

☑️ ToDos

  • 📝 Changelog update
  • 🚦 Tests (or not relevant)
  • [ ] C-API, if public C++ API changed.

@cla-bot cla-bot bot added the cla: yes label Mar 7, 2023
@michael-wb michael-wb force-pushed the mwb/migration-admin-api branch from cb85651 to cafa320 Compare March 7, 2023 05:07
@michael-wb michael-wb force-pushed the mwb/migration-admin-api branch from cafa320 to f041764 Compare March 12, 2023 04:21
@michael-wb michael-wb marked this pull request as ready for review March 13, 2023 04:44
@michael-wb
Copy link
Contributor Author

The failure on Jenkins is due to receiving an empty changeset for the bootstrap data after the migration to FLX has occurred. I posted a note about this on the PBS FLX migration channel.

CHANGELOG.md Outdated
@@ -63,6 +63,7 @@
* Add CAPI test for Binding Callback Thread Observer. ([PR #6156](https://github.com/realm/realm-core/pull/6156))
* Implement MigrationStore to support migration from PBS to FLX ([PR #6324](https://github.com/realm/realm-core/pull/6324))
* Removed overloads of `Session::bind()` that allow binding a sync session to server other than the one configured in `Session::Config` ([PR #6358](https://github.com/realm/realm-core/pull/6358)).
* Add admin api and test for performing the PBS->FLX migration and roll back on the server. (PR [#6366](https://github.com/realm/realm-core/pull/6366))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase this because master was released

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - I did a rebase and it still put it in the wrong place - fixed

bool complete = false;
};

MigrationStatus get_migration_status(const std::string& app_id) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this is called without a migration in place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a migration is not currently active, it will return a MigrationStatus with an empty status message and the isMigrated value will reflect the current migration state.

@@ -119,6 +120,17 @@ class AdminAPISession {
bool is_sync_terminated(const std::string& app_id) const;
bool is_initial_sync_complete(const std::string& app_id) const;

struct MigrationStatus {
std::string statusMessage;
std::string errorMessage;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this be optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - made it optional

@@ -0,0 +1,200 @@
#include <catch2/catch_all.hpp>
#include <chrono>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add this after the realm includes

#include <catch2/catch_all.hpp>
#include <chrono>

#include "flx_sync_harness.hpp"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would use full paths and angled brackets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the includes for the files in the src/ directory to use angled brackets. For the include files in the test/ directory, I updated them to use the appropriate path for the include file (similar to the other files in test/object-store/)

status = app_session.admin_api.get_migration_status(app_session.server_app_id);
if (logger && last_status != status.statusMessage) {
last_status = status.statusMessage;
logger->debug("PBS->FLX Migration status: %1", last_status);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also be waiting for the status of a rollback ("FLX->PBS....")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PBS->FLX string to be dynamic based on the operation being performed (i.e. PBS->FLX if migrate_to_flx is true, otherwise FLX->PBS)

REQUIRE(switch_to_flx == status.isMigrated);
if (logger) {
if (!status.errorMessage.empty())
logger->debug("PBS->FLX Migration returned error: %1", status.errorMessage);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

REQUIRE(status.errorMessage.empty());
}

// Populates a FLXSyncTestHarness with the g_large_array_schema with objects that are large enough that
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not accurate

9E7A

trigger_server_migration(session.app_session(), false, logger_ptr);

{
SyncTestFile pbs_config(session.app(), partition1, server_app_config.schema);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check partition2 too.

trigger_server_migration(session.app_session(), true, logger_ptr);

{
SyncTestFile flx_config(session.app()->current_user(), server_app_config.schema,
Copy link
Collaborator
@danieltabacaru danieltabacaru Mar 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand this test. Shouldn't the user connect as PBS to trigger the migration on the client?
As it stands, it's a native flx user which connects and gets the pbs data (still valuable). In this case, it should create "normal" subscriptions then (a subscription on Dog class without the partition string).

Edit: I think I finally understand the test: it's how the client reconnects after the migration is triggered on the client (as flx and creates subscriptions using the partition key(s)). I still think we should have a test with a native flx user which didn't go through the migration (my second sentence above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is less about using the client migration feature and more about validating being able to trigger a migration on the server and connecting to it as PBS (in not migrated state) or FLX (in migrated state). The FLX subscriptions I created are representative of the subscriptions that will be automatically generated.

We will need to add a separate test to verify the client migration, once it is complete.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

@danieltabacaru
Copy link
Collaborator
danieltabacaru commented Mar 13, 2023

The failure on Jenkins is due to receiving an empty changeset for the bootstrap data after the migration to FLX has occurred. I posted a note about this on the PBS FLX migration channel.

There is another failure, a build warning:
catching polymorphic type 'class std::runtime_error' by value [-Wcatch-value=] at flx_migration.cpp:41

Copy link
Collaborator
@danieltabacaru danieltabacaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@michael-wb
Copy link
Contributor Author

Kiro has a BAAS update https://github.com/10gen/baas/pull/8980 that will address the 0 == 5 failure when checking the table size after the PBS->FLX migration has occurred.

Copy link
Contributor
@jbreams jbreams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits/questions.


using namespace realm;

static void trigger_server_migration(const AppSession& app_session, bool switch_to_flx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we never call this with a non-null logger parameter, let's just remove the default nullptr value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal was to make it optional in case future tests didn't use a logger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We def shouldn't do this in this PR, but I kinda think we should make it so there's always a logger available in tests. Like when you create the TestAppSession or something you get a logger that's properly configured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, similar to the logger in TestContext for the core-and sync-tests. I'll create a task and do that separately.

}

// Add a set of count number of Dog objects to the realm
static std::vector<ObjectId> fill_test_data(SyncTestFile& config, std::string partition, int start = 1, int count = 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we never call these without specifying the values for start/count, let's remove their default values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

// Add some objects with the provided partition value
for (int i = 0; i < count; i++, ++start) {
auto id = ObjectId::gen();
auto obj = Object::create(c, realm, "Dog",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit. Feel free to ignore me here 😄, but i kinda don't like re-using the "Dog" classes from the app integration tests. Can we name this a non-silly name like Object and have non-fake properties like "breed"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I was using the default schema that was included in default_app_config()

try {
return progress_endpoint.get_json();
}
catch (const std::exception& e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit, but shouldn't we just let this exception get thrown so we get a stacktrace in the test logs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to catch the case where the http status was not between 200-299, but it is actually a terminate, which can't be caught, so I'll put it back.

}();
auto errorMessage = progress_result["errorMessage"];
if (errorMessage.is_string() && !errorMessage.get<std::string>().empty()) {
status.errorMessage.emplace(errorMessage.get<std::string>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not just throw here and on 735?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can change these to throw


using namespace realm;

static void trigger_server_migration(const AppSession& app_session, bool switch_to_flx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We def shouldn't do this in this PR, but I kinda think we should make it so there's always a logger available in tests. Like when you create the TestAppSession or something you get a logger that's properly configured.

@michael-wb michael-wb merged commit c1cd641 into master Mar 16, 2023
@michael-wb michael-wb deleted the mwb/migration-admin-api branch March 16, 2023 17:46
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add BAAS Admin API command to start/revert PBS->FLX server migration
3 participants
0