10000 [Blueprints] Import WXRs via the DataLiberation importer by adamziel · Pull Request #127 · WordPress/php-toolkit · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Blueprints] Import WXRs via the DataLiberation importer #127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 4, 2025

Conversation

adamziel
Copy link
Collaborator
@adamziel adamziel commented Jun 3, 2025

Adapts the import-markdown-directory.php script from the create-wp-site tool for ImportContentStep to start using the Data Liberation importer in Blueprints v2.

For example, this Blueprint would use the Data Liberation importer:

{
	"version": 2,
	"content": [
		{
			"type": "wxr",
			"source": "https://raw.githubusercontent.com/wordpress/blueprints/trunk/blueprints/stylish-press/site-content.wxr"
		}
	]
}

Implementation

The pipeline goes line this:

  • The runner puts php-toolkit.phar in the target site directory
  • The step handler creates the import-markdown-directory.php script in the target site directory
  • The step handler buffers the entire WXR file into the target site directory
  • The step handler runs the import-markdown-directory.php script in a subprocess, pointing it to the buffered WXR file
  • The import-markdown-directory.php writes JSON messages to a text file, one message per line. The step handler reads and decodes them and reports the progress and any errors back to the user. This is handled via the new WordPress\Blueprints\Process class that extends the Symfony Process class.

This PR also includes a few bugfixes in the Data Liberation pipeline.

Remaining work

  • Use the same data reference resolution mechanism in Blueprints and in the importer to support sourcing media files from any remote execution context (e.g. a git repo).
  • Pass progress updates from the importer script to the Blueprint runner
  • Stream the WXR file directly from its reference. Do not buffer it first. Ditto for the "type": "posts" import mode.

Follow-up work

  • Source php-toolkit.phar from somewhere in a production blueprints.phar release. GitHub Releases maybe?
  • Pass configuration options to the importer, e.g. allowed media domains, author mapping mode etc.
  • Use the Data Liberation importer also for importing posts.

@adamziel adamziel force-pushed the push-qvpwqtzqzyrw branch from 9f1d107 to 7f13855 Compare June 3, 2025 22:49
@adamziel adamziel marked this pull request as ready for review June 4, 2025 14:03
@adamziel adamziel merged commit b18dbb2 into trunk Jun 4, 2025
18 of 21 checks passed
@github-project-automation github-project-automation bot moved this from Inbox to Done in Playground Board Jun 4, 2025
adamziel added a commit that referenced this pull request Jun 9, 2025
adamziel added a commit that referenced this pull request Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

1 participant
0