8000 Parallelize pg_restore operations by hanefi · Pull Request #561 · dimitri/pgcopydb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Parallelize pg_restore operations #561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 27 additions & 5 deletions docs/ref/pgcopydb_clone.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Postgres instance to the target Postgres instance.
--dir Work directory to use
--table-jobs Number of concurrent COPY jobs to run
--index-jobs Number of concurrent CREATE INDEX jobs to run
--restore-jobs Number of concurrent jobs for pg_restore
--large-objects-jobs Number of concurrent Large Objects jobs to run
--split-tables-larger-than Same-table concurrency size threshold
--drop-if-exists On the target database, clean-up from a previous run first
Expand Down Expand Up @@ -115,6 +116,9 @@ The ``pgcopydb clone`` command implements the following steps:
When filtering is used, the ``pg_restore --use-list`` feature is used
to filter the list of objects to restore in this step.

This step uses as many as ``--restore-jobs`` jobs for ``pg_restore`` to
share the workload and restore the objects in parallel.

4. Then as many as ``--table-jobs`` COPY sub-processes are started to
share the workload and COPY the data from the source to the target
database one table at a time, in a loop.
Expand Down Expand Up @@ -146,14 +150,14 @@ The ``pgcopydb clone`` command implements the following steps:
be created in parallel with other indexes on the same table, avoiding
an EXCLUSIVE LOCK while creating the index.

8. As many as ``-table-jobs`` VACUUM ANALYZE sub-processes are started to
8. As many as ``--table-jobs`` VACUUM ANALYZE sub-processes are started to
share the workload. As soon as a table data COPY has completed, the
table is queued for processing by the VACUUM ANALYZE sub-processes.

9. An auxilliary process is loops over the sequences on the source
database and for each of them runs a separate query on the source to
fetch the ``last_value`` and the ``is_called`` metadata the same way
that pg_dump does.
9. An auxilliary process loops over the sequences on the source database and
for each of them runs a separate query on the source to fetch the
``last_value`` and the ``is_called`` metadata the same way that pg_dump
does.

For each sequence, pgcopydb then calls ``pg_catalog.setval()`` on the
target database with the information obtained on the source database.
Expand All @@ -166,6 +170,9 @@ The ``pgcopydb clone`` command implements the following steps:
--use-list`` option so that indexes and primary key constraints
already created in steps 6 and 7 are properly skipped now.

This step uses as many as ``--restore-jobs`` jobs for ``pg_restore`` to
share the workload and restore the objects in parallel.

.. _superuser:

Postgres privileges, superuser, and dump and restore
Expand Down Expand Up @@ -423,6 +430,15 @@ The following options are available to ``pgcopydb clone``:
Postgres target system, minus some cores that are going to be used for
handling the COPY operations.

--restore-jobs

How many threads or processes can be used during pg_restore. A good option is
to set this option to the count of CPU cores that are available on the
Postgres target system.

If this value is not set, we reuse the ``--index-jobs`` value. If that value
is not set either, we use the the default value for ``--index-jobs``.

--large-object-jobs

How many worker processes to start to copy Large Objects concurrently.
Expand Down Expand Up @@ -715,6 +731,12 @@ PGCOPYDB_INDEX_JOBS
parallel. When ``--index-jobs`` is ommitted from the command line, then
this environment variable is used.

PGCOPYDB_RESTORE_JOBS

Number of concurrent jobs allowed to run `pg_restore` operations in
parallel. When ``--restore-jobs`` is ommitted from the command line, then
this environment variable is used.

PGCOPYDB_LARGE_OBJECTS_JOBS

Number of concurrent jobs allowed to copy Large Objects data in parallel.
Expand Down
13 changes: 11 additions & 2 deletions docs/ref/pgcopydb_copy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ also :ref:`pgcopydb_clone`.
--dir Work directory to use
--table-jobs Number of concurrent COPY jobs to run
--index-jobs Number of concurrent CREATE INDEX jobs to run
--restore-jobs Number of concurrent jobs for pg_restore
--drop-if-exists On the target database, clean-up from a previous run first
--roles Also copy roles found on source to target
--no-owner Do not set ownership of objects to match the original database
Expand Down Expand Up @@ -169,6 +170,7 @@ steps.
--dir Work directory to use
--table-jobs Number of concurrent COPY jobs to run
--index-jobs Number of concurrent CREATE INDEX jobs to run
--restore-jobs Number of concurrent jobs for pg_restore
--drop-if-exists On the target database, clean-up from a previous run first
--no-owner Do not set ownership of objects to match the original database
--skip-large-objects Skip copying large objects (blobs)
Expand Down Expand Up @@ -300,6 +302,7 @@ target database.
--target Postgres URI to the target database
--dir Work directory to use
--index-jobs Number of concurrent CREATE INDEX jobs to run
--restore-jobs Number of concurrent jobs for pg_restore
--restart Allow restarting when temp files exist already
--resume Allow resuming operations after a failure
--not-consistent Allow taking a new snapshot on the source database
Expand All @@ -320,8 +323,8 @@ is found existing already on the target database.

::

pgcopydb copy indexes: Create all the indexes found in the source database in the target
usage: pgcopydb copy indexes --source ... --target ... [ --table-jobs ... --index-jobs ... ]
pgcopydb copy constraints: Create all the constraints found in the source database in the target
usage: pgcopydb copy constraints --source ... --target ... [ --table-jobs ... --index-jobs ... ]

--source Postgres URI to the source database
--target Postgres URI to the target database
Expand Down Expand Up @@ -532,6 +535,12 @@ PGCOPYDB_INDEX_JOBS
parallel. When ``--index-jobs`` is ommitted from the command line, then
this environment variable is used.

PGCOPYDB_RESTORE_JOBS

Number of concurrent jobs allowed to run `pg_restore` operations in
parallel. When ``--restore-jobs`` is ommitted from the command line, then
this environment variable is used.

PGCOPYDB_LARGE_OBJECTS_JOBS

Number of concurrent jobs allowed to copy Large Objects data in parallel.
Expand Down
13 changes: 13 additions & 0 deletions docs/ref/pgcopydb_restore.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ be fed with the directory output from the ``pgcopydb dump ...`` commands.
--source Postgres URI to the source database
--target Postgres URI to the target database
--dir Work directory to use
--restore-jobs Number of concurrent jobs for pg_restore
--drop-if-exists On the target database, clean-up from a previous run first
--no-owner Do not set ownership of objects to match the original database
--no-acl Prevent restoration of access privileges (grant/revoke commands).
Expand Down Expand Up @@ -67,6 +68,7 @@ be fed with the directory output from the ``pgcopydb dump ...`` commands.
--source Postgres URI to the source database
--target Postgres URI to the target database
--dir Work directory to use
--restore-jobs Number of concurrent jobs for pg_restore
--drop-if-exists On the target database, clean-up from a previous run first
--no-owner Do not set ownership of objects to match the original database
--no-acl Prevent restoration of access privileges (grant/revoke commands).
Expand Down Expand Up @@ -96,6 +98,7 @@ be fed with the directory output from the ``pgcopydb dump ...`` commands.
--source Postgres URI to the source database
--target Postgres URI to the target database
--dir Work directory to use
--restore-jobs Number of concurrent jobs for pg_restore
--no-owner Do not set ownership of objects to match the original database
--no-acl Prevent restoration of access privileges (grant/revoke commands).
--no-comments Do not output commands to restore comments
Expand Down Expand Up @@ -129,6 +132,7 @@ target database.
--source Postgres URI to the source database
--target Postgres URI to the target database
--dir Work directory to use
--restore-jobs Number of concurrent jobs for pg_restore


.. _pgcopydb_restore_parse_list:
Expand Down Expand Up @@ -206,6 +210,15 @@ The following options are available to ``pgcopydb restore schema``:
``${TMPDIR}/pgcopydb`` when the environment variable is set, or
then to ``/tmp/pgcopydb``.

--restore-jobs

How many threads or processes can be used during pg_restore. A good option is
to set this option to the count of CPU cores that are available on the
Postgres target system.

If this value is not set, we reuse the ``--index-jobs`` value. If that value
is not set either, we use the the default value for ``--index-jobs``.

--drop-if-exists

When restoring the schema on the target Postgres instance, ``pgcopydb``
Expand Down
1 change: 1 addition & 0 deletions src/bin/pgcopydb/cli_clone_follow.c
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
" --dir Work directory to use\n" \
" --table-jobs Number of concurrent COPY jobs to run\n" \
" --index-jobs Number of concurrent CREATE INDEX jobs to run\n" \
" --restore-jobs Number of concurrent jobs for pg_restore\n" \
" --large-objects-jobs Number of concurrent Large Objects jobs to run\n" \
" --split-tables-larger-than Same-table concurrency size threshold\n" \
" --drop-if-exists On the target database, clean-up from a previous run first\n" \
Expand Down
46 changes: 45 additions & 1 deletion src/bin/pgcopydb/cli_common.c
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,28 @@ cli_copydb_getenv(CopyDBOptions *options)
}
}

if (env_exists(PGCOPYDB_RESTORE_JOBS))
{
char jobs[BUFSIZE] = { 0 };

if (get_env_copy(PGCOPYDB_RESTORE_JOBS, jobs, sizeof(jobs)))
{
if (!stringToInt(jobs, &options->restoreOptions.jobs) ||
options->restoreOptions.jobs < 1 ||
options->restoreOptions.jobs > 128)
{
log_fatal("Failed to parse PGCOPYDB_RESTORE_JOBS: \"%s\"",
jobs);
++errors;
}
}
else
{
/* errors have already been logged */
++errors;
}
}

if (env_exists(PGCOPYDB_LARGE_OBJECTS_JOBS))
{
char jobs[BUFSIZE] = { 0 };
Expand Down Expand Up @@ -713,6 +735,7 @@ cli_copy_db_getopts(int argc, char **argv)
{ "no-role-passwords", no_argument, NULL, 'P' },
{ "no-owner", no_argument, NULL, 'O' }, /* pg_restore -O */
{ "no-comments", no_argument, NULL, 'X' },
{ "restore-jobs", required_argument, NULL, 'j' }, /* pg_restore --jobs */
{ "no-acl", no_argument, NULL, 'x' }, /* pg_restore -x */
{ "skip-blobs", no_argument, NULL, 'B' },
{ "skip-large-objects", no_argument, NULL, 'B' },
Expand Down Expand Up @@ -750,6 +773,7 @@ cli_copy_db_getopts(int argc, char **argv)
/* install default values */
options.tableJobs = DEFAULT_TABLE_JOBS;
options.indexJobs = DEFAULT_INDEX_JOBS;
options.restoreOptions.jobs = DEFAULT_RESTORE_JOBS;
options.lObjectJobs = DEFAULT_LARGE_OBJECTS_JOBS;
options.splitTablesLargerThan.bytes = DEFAULT_SPLIT_TABLES_LARGER_THAN;

Expand All @@ -761,7 +785,7 @@ cli_copy_db_getopts(int argc, char **argv)
}

while ((c = getopt_long(argc, argv,
"S:T:D:J:I:b:L:cOBemlirRCN:xXCtfo:p:s:E:F:Q:iVvdzqh",
"S:T:D:J:I:b:L:cOBemlirRCN:xXj:Ctfo:p:s:E:F:Q:iVvdzqh",
long_options, &option_index)) != -1)
{
switch (c)
Expand Down Expand Up @@ -900,6 +924,19 @@ cli_copy_db_getopts(int argc, char **argv)
break;
}

case 'j':
{
if (!stringToInt(optarg, &options.restoreOptions.jobs) ||
options.restoreOptions.jobs < 1 ||
options.restoreOptions.jobs > 128)
{
log_fatal("Failed to parse --restore-jobs count: \"%s\"", optarg);
++errors;
}
log_trace("--restore-jobs %d", options.restoreOptions.jobs);
break;
}

case 'B':
{
options.skipLargeObjects = true;
Expand Down Expand Up @@ -1133,6 +1170,13 @@ cli_copy_db_getopts(int argc, char **argv)
}
}

/* if we haven't set restore-jobs, set it to index-jobs */
if (options.restoreOptions.jobs == DEFAULT_RESTORE_JOBS)
{
options.restoreOptions.jobs = options.indexJobs;
log_trace("--restore-jobs %d", options.indexJobs);
}

if (options.connStrings.source_pguri == NULL ||
options.connStrings.target_pguri == NULL)
{
Expand Down
3 changes: 3 additions & 0 deletions src/bin/pgcopydb/cli_copy.c
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ static CommandLine copy_db_command =
" --dir Work directory to use\n"
" --table-jobs Number of concurrent COPY jobs to run\n"
" --index-jobs Number of concurrent CREATE INDEX jobs to run\n"
" --restore-jobs Number of concurrent jobs for pg_restore\n"
" --drop-if-exists On the target database, clean-up from a previous run first\n"
" --roles Also copy roles found on source to target\n"
" --no-owner Do not set ownership of objects to match the original database\n"
Expand Down Expand Up @@ -108,6 +109,7 @@ static CommandLine copy_data_command =
" --dir Work directory to use\n"
" --table-jobs Number of concurrent COPY jobs to run\n"
" --index-jobs Number of concurrent CREATE INDEX jobs to run\n"
" --restore-jobs Number of concurrent jobs for pg_restore\n"
" --skip-large-objects Skip copying large objects (blobs)\n"
" --filters <filename> Use the filters defined in <filename>\n"
" --restart Allow restarting when temp files exist already\n"
Expand Down Expand Up @@ -176,6 +178,7 @@ static CommandLine copy_indexes_command =
" --target Postgres URI to the target database\n"
" --dir Work directory to use\n"
" --index-jobs Number of concurrent CREATE INDEX jobs to run\n"
" --restore-jobs Number of concurrent jobs for pg_restore\n"
" --filters <filename> Use the filters defined in <filename>\n"
" --restart Allow restarting when temp files exist already\n"
" --resume Allow resuming operations after a failure\n"
Expand Down
33 changes: 31 additions & 2 deletions src/bin/pgcopydb/cli_restore.c
3D11
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ static CommandLine restore_schema_command =
" --source Postgres URI to the source database\n"
" --target Postgres URI to the target database\n"
" --dir Work directory to use\n"
" --restore-jobs Number of concurrent jobs for pg_restore\n"
" --drop-if-exists On the target database, clean-up from a previous run first\n"
" --no-owner Do not set ownership of objects to match the original database\n"
" --no-acl Prevent restoration of access privileges (grant/revoke commands).\n"
Expand All @@ -57,6 +58,7 @@ static CommandLine restore_schema_pre_data_command =
" --source Postgres URI to the source database\n"
" --target Postgres URI to the target database\n"
" --dir Work directory to use\n"
" --restore-jobs Number of concurrent jobs for pg_restore\n"
" --drop-if-exists On the target database, clean-up from a previous run first\n"
" --no-owner Do not set ownership of objects to match the original database\n"
" --no-acl Prevent restoration of access privileges (grant/revoke commands).\n"
Expand All @@ -76,6 +78,7 @@ static CommandLine restore_schema_post_data_command =
" --source Postgres URI to the source database\n"
" --target Postgres URI to the target database\n"
" --dir Work directory to use\n"
" --restore-jobs Number of concurrent jobs for pg_restore\n"
" --no-owner Do not set ownership of objects to match the original database\n"
" --no-acl Prevent restoration of access privileges (grant/revoke commands).\n"
" --no-comments Do not output commands to restore comments\n"
Expand All @@ -93,7 +96,8 @@ static CommandLine restore_roles_command =
" --dir <dir> [ --source <URI> ] --target <URI> ",
" --source Postgres URI to the source database\n"
" --target Postgres URI to the target database\n"
" --dir Work directory to use\n",
" --dir Work directory to use\n"
" --restore-jobs Number of concurrent jobs for pg_restore\n",
cli_restore_schema_getopts,
cli_restore_roles);

Expand Down Expand Up @@ -148,6 +152,7 @@ cli_restore_schema_getopts(int argc, char **argv)
{ "drop-if-exists", no_argument, NULL, 'c' }, /* pg_restore -c */
{ "no-owner", no_argument, NULL, 'O' }, /* pg_restore -O */
{ "no-comments", no_argument, NULL, 'X' },
{ "restore-jobs", required_argument, NULL, 'j' }, /* pg_restore --jobs */
{ "no-acl", no_argument, NULL, 'x' }, /* pg_restore -x */
{ "filter", required_argument, NULL, 'F' },
{ "filters", required_argument, NULL, 'F' },
Expand All @@ -170,14 +175,18 @@ cli_restore_schema_getopts(int argc, char **argv)

optind = 0;

/* install default values */
options.indexJobs = DEFAULT_INDEX_JOBS;
options.restoreOptions.jobs = DEFAULT_RESTORE_JOBS;

/* read values from the environment */
if (!cli_copydb_getenv(&options))
{
log_fatal("Failed to read default values from the environment");
exit(EXIT_CODE_BAD_ARGS);
}

while ((c = getopt_long(argc, argv, "S:T:D:s:cOxXFeErRCNVvdzqh",
while ((c = getopt_long(argc, argv, "S:T:D:s:cOj:xXFeErRCNVvdzqh",
long_options, &option_index)) != -1)
{
switch (c)
Expand Down Expand Up @@ -229,6 +238,19 @@ cli_restore_schema_getopts(int argc, char **argv)
break;
}

case 'j':
{
if (!stringToInt(optarg, &options.restoreOptions.jobs) ||
options.restoreOptions.jobs < 1 ||
options.restoreOptions.jobs > 128)
{
log_fatal("Failed to parse --restore-jobs count: \"%s\"", optarg);
++errors;
}
log_trace("--restore-jobs %d", options.restoreOptions.jobs);
break;
}

case 'x':
{
options.restoreOptions.noACL = true;
Expand Down Expand Up @@ -384,6 +406,13 @@ cli_restore_schema_getopts(int argc, char **argv)
exit(EXIT_CODE_BAD_ARGS);
}

/* if we haven't set restore-jobs, set it to index-jobs */
if (options.restoreOptions.jobs == DEFAULT_RESTORE_JOBS)
{
options.restoreOptions.jobs = options.indexJobs;
log_trace("--restore-jobs %d", options.indexJobs);
}

/* publish our option parsing in the global variable */
restoreDBoptions = options;

Expand Down
Loading
0