8000 Parallelize pg_restore operations by hanefi · Pull Request #561 · dimitri/pgcopydb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Parallelize pg_restore operations #561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 6, 2023

Conversation

hanefi
Copy link
Contributor
@hanefi hanefi commented Nov 30, 2023

This commit adds a new option --restore-jobs to pgcopydb that allows specifying how many jobs can be used to run pg_restore operations in parallel. This option can also be set using the PGCOPYDB_RESTORE_JOBS environment variable.

When this option is set to 1, pgcopydb will run pg_restore with the --single-transaction option, and pgcopydb will behave the same as it used to. Otherwise, pg_restore will be run with the --jobs option set to the number of jobs specified by the user.

If the user does not supply the --restore-jobs option, or set the environment variable, pgcopydb will use the index-jobs value as default.

Fixes: #539

Copy link
Owner
@dimitri dimitri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!

I think a good default value for --restore-jobs should be the current value for --index-jobs (dynamic, from command line or environment) , because this impacts the target server only.

Also I'm not sure we need --single-transaction on the target server at all, actually, but I think I like the way to chose to still have it when --restore-jobs is 1.

@hanefi hanefi force-pushed the pgrestore-jobs-param branch from f74b973 to f347558 Compare December 4, 2023 11:32
@hanefi hanefi marked this pull request as ready for review December 4, 2023 21:16
@hanefi
Copy link
Contributor Author
hanefi commented Dec 4, 2023

I think a good default value for --restore-jobs should be the current value for --index-jobs (dynamic, from command line or environment) , because this impacts the target server only.

Agreed. This was not a trivial change as some pgcopydb restore commands did not have --index-jobs parameter, but we need to set a default value for --restore-jobs. I ended up checking for the PGCOPYDB_INDEX_JOBS, or --index-jobs and if both failed I used the default value for index jobs which is 4. I think this is acceptable.

Also I'm not sure we need --single-transaction on the target server at all, actually, but I think I like the way to chose to still have it when --restore-jobs is 1.

I think we should keep a workaround for users that desire to keep the old behavior where we always used --single-transaction. This may not be really needed, but it is nice to let them have that option.

@hanefi hanefi force-pushed the pgrestore-jobs-param branch from d38d114 to bb99942 Compare December 5, 2023 15:06
This commit adds a new option --restore-jobs to pgcopydb that allows
specifying how many jobs can be used to run pg_restore operations in
parallel. This option can also be set using the PGCOPYDB_RESTORE_JOBS
environment variable.

When this option is set to 1, pgcopydb will run pg_restore with the
--single-transaction option, and pgcopydb will behave the same as it
used to. Otherwise, pg_restore will be run with the --jobs option set
to the number of jobs specified by the user.

If the user does not supply the --restore-jobs option, or set the
environment variable, pgcopydb will use the index-jobs value as default.
@hanefi hanefi force-pushed the pgrestore-jobs-param branch from b142fef to 7ace7db Compare December 6, 2023 11:40
@dimitri dimitri merged commit b0e61ca into dimitri:main Dec 6, 2023
@dimitri dimitri assigned dimitri and hanefi and unassigned dimitri Dec 6, 2023
@dimitri dimitri added the enhancement New feature or request label Dec 6, 2023
@dimitri dimitri added this to the v0.15 milestone Dec 6, 2023
@hanefi hanefi deleted the pgrestore-jobs-param branch December 6, 2023 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can foreign key creation be parallelized?
2 participants
0