8000 Musicbrainz Schema update v30 by frisi · Pull Request #20 · acoustid/mbslave · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Musicbrainz Schema update v30 #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

frisi
Copy link
@frisi frisi commented May 27, 2025

resolves #19

the manually entered search path definitions for functions would conflict on every run of scripts/update_sql.sh
without them, the sync process could fail with errors like this if you configure a username different to musicbrainz

psycopg2.errors.UndefinedFunction: function set_mediums_recordings_first_release_dates(integer[]) does not exist
CONTEXT:  PL/pgSQL function musicbrainz.a_upd_medium_mirror() line 6 at PERFORM

the reason is that running mbslave sync configures the database session without set_search_path and the default search path for the custom username does not match the musicbrainz schema anymore.

def mbslave_sync_main(config: Config, args: argparse.Namespace) -> None:
    db = connect_db(config)

as a workaround you can set the searchpath for the database user configured in mbslave.cfg option database.user

[database]
port=5432
user=foo

on user level

-- check user level settings via psql
\drds foo;

-- set search path on user level
ALTER USER foo SET search_path TO musicbrainz, public;

Harald Friessnegger added 4 commits May 26, 2025 14:06
note that this reverts the adaptions done by acoustid#8

as musicbrainz does not ship the scripts with function definitions
containing a `SET schema` local adaptions would always lead to merge
problems.

if users have problems that functions do not use the correct schema for
queries and can't find functions or tables, consider to set the schema on session,
user or database level:

```
-- session
SET search_path to musicbrainz,public;

-- user level
ALTER USER musicbrainz SET search_path TO musicbrainz, public;

-- check user level settings via psql
\drds mbdata;

-- check user level via query
SELECT
    r.rolname AS role_name,
    rs.setconfig AS configured_settings
FROM
    pg_roles r
LEFT JOIN
    pg_db_role_setting rs ON r.oid = rs.setrole
WHERE
    r.rolname = 'musicbrainz';

-- db level
SELECT setting FROM pg_settings WHERE name = 'search_path';
ALTER DATABASE database_name SET search_path TO musicbrainz, public;
```
musicbrainz has fixed upgrade issues
see metabrainz/musicbrainz-server@c7b8c29
Copy link
coderabbitai bot commented May 27, 2025

Walkthrough

This update implements the MusicBrainz schema change for version 30.0.0, covering all required SQL migrations, new and updated functions, triggers, constraints, and documentation. It migrates the schema, updates partitioned tables, adds new columns and tables, adjusts indexes, and provides upgrade instructions, fully aligning the codebase with the latest MusicBrainz schema release.

Changes

Files / Areas Change Summary
mbsclave/init.py, CHANGELOG.rst Bumped version to 30.0.0 and added a changelog entry for schema change 30.
README.rst Fixed typos, clarified environment variable usage, added advisory on custom PostgreSQL search_path, and documented schema upgrade steps for version 30.0.0.
.../CreateTables.sql, .../CreateIndexes.sql, .../CreatePrimaryKeys.sql, .../CreateFKConstraints.sql, .../CreateTriggers.sql, .../CreateReplicationTriggers.sql, .../CreateReplicationTriggers2.sql, .../TruncateTables.sql, .../InsertTestData.sql, .../DropTables.sql, .../DropIndexes.sql, .../DropPrimaryKeys.sql, .../DropFKConstraints.sql, .../DropReplicationTriggers.sql, .../DropReplicationTriggers2.sql, .../DropTriggers.sql, .../DropMirrorOnlyTriggers.sql Added/removed columns and tables (notably medium_gid_redirect, new gid on medium), changed indexes (including unique index on medium(gid)), added/dropped triggers and constraints for new/updated tables, updated test data for new schema, and ensured all drop/truncate scripts account for new objects.
.../CreateFunctions.sql, .../CreateMirrorOnlyFunctions.sql, .../DropFunctions.sql Added/updated functions for new triggers, removed SET search_path from function declarations, simplified delete_unused_url, added trigger functions for new partitioned tables, and updated artist/release group row retrieval logic.
.../caa/CreateMQTriggers.sql, .../caa/DropMQTriggers.sql, .../eaa/CreateMQTriggers.sql, .../eaa/DropMQTriggers.sql Removed all AMQP trigger functions and triggers for cover art and event art archives, and dropped related objects.
.../caa/CreateViews.sql, .../eaa/CreateViews.sql Changed logic in views to include entries with MIME type 'application/pdf' as possible front cover/event art.
.../updates/2024*.sql, .../updates/schema-change/30.*.sql Added migration scripts for all schema changes: partitioned tables, new columns, new triggers, new/updated functions, new views, new constraints, and data migration for new UUIDs.

Assessment against linked issues

Objective Addressed Explanation
Update schema and code to match MusicBrainz schema change 30.0.0 (Issue #19)
Add/adjust SQL migration scripts, triggers, constraints, and functions per schema 30 requirements (#19)
Update documentation (README, changelog) to reflect schema 30 upgrade steps and requirements (#19)
Remove/replace deprecated triggers, functions, and views as per schema 30 changes (#19)
Ensure new/changed tables, columns, indexes, and test data are present and consistent with schema 30 (#19)
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 22

🔭 Outside diff range comments (1)
mbslave/sql/CreateMirrorOnlyTriggers.sql (1)

1-2: 🛠️ Refactor suggestion

Set search_path for mirror trigger definitions

This file creates triggers on unqualified tables (medium, release_group_primary_type, etc.) without setting the schema. On non-default search_path configurations, these DDLs will error out. Please add:

SET search_path = musicbrainz, public;

right after \set ON_ERROR_STOP 1.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6d1686 and 4c601fc.

📒 Files selected for processing (46)
  • CHANGELOG.rst (1 hunks)
  • README.rst (4 hunks)
  • mbslave/__init__.py (1 hunks)
  • mbslave/sql/CreateFKConstraints.sql (1 hunks)
  • mbslave/sql/CreateFunctions.sql (102 hunks)
  • mbslave/sql/CreateIndexes.sql (3 hunks)
  • mbslave/sql/CreateMirrorOnlyFunctions.sql (12 hunks)
  • mbslave/sql/CreateMirrorOnlyTriggers.sql (3 hunks)
  • mbslave/sql/CreatePrimaryKeys.sql (1 hunks)
  • mbslave/sql/CreateReplicationTriggers.sql (1 hunks)
  • mbslave/sql/CreateReplicationTriggers2.sql (1 hunks)
  • mbslave/sql/CreateTables.sql (4 hunks)
  • mbslave/sql/CreateTriggers.sql (3 hunks)
  • mbslave/sql/DropFKConstraints.sql (1 hunks)
  • mbslave/sql/DropFunctions.sql (1 hunks)
  • mbslave/sql/DropIndexes.sql (1 hunks)
  • mbslave/sql/DropMirrorOnlyTriggers.sql (2 hunks)
  • mbslave/sql/DropPrimaryKeys.sql (1 hunks)
  • mbslave/sql/DropReplicationTriggers.sql (1 hunks)
  • mbslave/sql/Dro 8000 pReplicationTriggers2.sql (1 hunks)
  • mbslave/sql/DropTables.sql (1 hunks)
  • mbslave/sql/DropTriggers.sql (2 hunks)
  • mbslave/sql/InsertTestData.sql (4 hunks)
  • mbslave/sql/TruncateTables.sql (1 hunks)
  • mbslave/sql/caa/CreateMQTriggers.sql (0 hunks)
  • mbslave/sql/caa/CreateViews.sql (0 hunks)
  • mbslave/sql/caa/DropMQTriggers.sql (0 hunks)
  • mbslave/sql/eaa/CreateMQTriggers.sql (0 hunks)
  • mbslave/sql/eaa/CreateViews.sql (0 hunks)
  • mbslave/sql/eaa/DropMQTriggers.sql (0 hunks)
  • mbslave/sql/updates/20240221-mbs-13492.sql (1 hunks)
  • mbslave/sql/updates/20240726-mbs-9373.sql (1 hunks)
  • mbslave/sql/updates/20241017-mbs-9253-13464.sql (1 hunks)
  • mbslave/sql/updates/20241017-mbs-9253-master_and_standalone.sql (1 hunks)
  • mbslave/sql/updates/20241017-mbs-9253-mirror_only.sql (1 hunks)
  • mbslave/sql/updates/20241125-mbs-13832.sql (1 hunks)
  • mbslave/sql/updates/20250320-mbs-13768-fks.sql (1 hunks)
  • mbslave/sql/updates/20250320-mbs-13768.sql (1 hunks)
  • mbslave/sql/updates/20250408-mbs-13322.sql (1 hunks)
  • mbslave/sql/updates/20250408-mbs-13964-all.sql (1 hunks)
  • mbslave/sql/updates/20250425-mbs-13464-master_and_standalone.sql (1 hunks)
  • mbslave/sql/updates/20250425-mbs-13464.sql (1 hunks)
  • mbslave/sql/updates/20250425-mbs-13966.sql (1 hunks)
  • mbslave/sql/updates/schema-change/30.all.sql (1 hunks)
  • mbslave/sql/updates/schema-change/30.master_and_standalone.sql (1 hunks)
  • mbslave/sql/updates/schema-change/30.mirror_only.sql (1 hunks)
💤 Files with no reviewable changes (6)
  • mbslave/sql/caa/CreateViews.sql
  • mbslave/sql/eaa/CreateViews.sql
  • mbslave/sql/eaa/DropMQTriggers.sql
  • mbslave/sql/caa/DropMQTriggers.sql
  • mbslave/sql/eaa/CreateMQTriggers.sql
  • mbslave/sql/caa/CreateMQTriggers.sql
🔇 Additional comments (65)
CHANGELOG.rst (1)

1-2: Version entry added correctly
The new version header and underline match the existing RST style.

mbslave/__init__.py (1)

4-4: Bumped package version to 30.0.0
Matches the new schema release. Ensure downstream packaging and distribution are updated accordingly.

mbslave/sql/DropTables.sql (1)

271-271: Remove medium_gid_redirect table on downgrade
Good addition to match the new table. As with indexes, consider guarding against missing tables with IF EXISTS for idempotent downgrades.

mbslave/sql/DropReplicationTriggers.sql (1)

207-207:

✅ Verification successful

Drop trigger for medium_gid_redirect replication
This complements the creation of reptg_medium_gid_redirect. Confirm the trigger name matches your create script.

Run:


🏁 Script executed:

#!/bin/bash
# Verify trigger definition exists
rg -n "CREATE TRIGGER.*reptg_medium_gid_redirect" -n mbslave/sql

Length of output: 155


Drop trigger name matches creation – verified
The DROP TRIGGER IF EXISTS reptg_medium_gid_redirect in DropReplicationTriggers.sql lines up with the CREATE TRIGGER "reptg_medium_gid_redirect" in mbslave/sql/CreateReplicationTriggers.sql:818. No action needed.

mbslave/sql/TruncateTables.sql (1)

271-271: Include medium_gid_redirect in truncation
Truncating medium_gid_redirect with RESTART IDENTITY CASCADE ensures its sequence resets alongside related data. Order looks consistent between medium_format and medium_index.

mbslave/sql/CreateFKConstraints.sql (1)

3009-3012: Add FK on medium_gid_redirect.new_id
Constraint medium_gid_redirect_fk_new_id enforces referential integrity against medium(id), matching patterns used for other *_gid_redirect tables.

mbslave/sql/CreatePrimaryKeys.sql (1)

267-267: Confirm primary key addition for medium_gid_redirect
This line adds the medium_gid_redirect_pkey constraint on the medium_gid_redirect table, aligning with its definition in CreateTables.sql and its removal in DropPrimaryKeys.sql. Everything matches up.

mbslave/sql/DropPrimaryKeys.sql (1)

267-267: Confirm drop of medium_gid_redirect primary key
This drops the medium_gid_redirect_pkey constraint you added in CreatePrimaryKeys.sql. Good symmetry for rollbacks.

mbslave/sql/DropReplicationTriggers2.sql (1)

207-207: Confirm drop of replication trigger for medium_gid_redirect
This line removes reptg2_medium_gid_redirect on medium_gid_redirect, matching the trigger created in CreateReplicationTriggers2.sql. Looks correct.

mbslave/sql/DropMirrorOnlyTriggers.sql (3)

1-3: Missing schema context for trigger operations
Without an explicit SET search_path or schema qualifiers, these DROP TRIGGER statements will only apply if the user’s default search path includes the musicbrainz schema. To avoid “trigger does not exist” errors under custom users, add a SET search_path = musicbrainz; at the top or qualify each table with the schema.


20-21: Old mirror triggers for primary/secondary types dropped
Dropping a_upd_release_group_primary_type_mirror and a_upd_release_group_secondary_type_mirror aligns with the new trigger definitions in the mirror-only scripts.


36-37: Old constraint triggers for primary/secondary types dropped
Removing apply_artist_release_group_pending_updates_mirror on both type tables prepares for the updated deferred triggers.

mbslave/sql/CreateReplicationTriggers2.sql (2)

1-3: Add schema search_path for replication triggers
These triggers operate in the musicbrainz schema. Insert a SET search_path = musicbrainz; immediately after \set ON_ERROR_STOP 1 to ensure FOR EACH ROW statements target the correct schema under non-default users.


818-821: New replication trigger for medium_gid_redirect
The trigger reptg2_medium_gid_redirect is correctly inserted in alphabetical order between medium_format and medium_index. It will fire on all DML and call dbmirror2.recordchange(), matching the pattern of other reptg2_ triggers.

mbslave/sql/updates/20240221-mbs-13492.sql (2)

1-5: Ensure correct schema and timeout settings
You disable the statement timeout locally, which is good for long-running updates. However, add SET search_path = musicbrainz; after \set ON_ERROR_STOP 1 to guarantee the editor table is found under custom users.


23-23: The COMMIT; correctly closes the transaction.

mbslave/sql/DropFKConstraints.sql (2)

1-3: Add schema search_path for constraint drops
Without a SET search_path = musicbrainz;, dropping medium_gid_redirect_fk_new_id may fail under non-default users. Either add the search path at the top or qualify the table with musicbrainz..


600-600: Drop of medium_gid_redirect foreign key
Dropping medium_gid_redirect_fk_new_id here aligns with the subsequent re-add in the migration. Good catch to maintain rollback capability.

mbslave/sql/updates/20250320-mbs-13768-fks.sql (2)

1-3: ON_ERROR_STOP and search_path set correctly
The psql meta-command \set ON_ERROR_STOP 1 and SET search_path = musicbrainz; ensure that the migration halts on errors and operates in the correct schema.


7-10: Foreign key constraint on medium_gid_redirect
Adding medium_gid_redirect_fk_new_id referencing medium(id) enforces referential integrity for the UID redirect table. Confirm that the medium_gid_redirect table exists and its new_id column is correctly typed.

mbslave/sql/InsertTestData.sql (5)

113-114: Include new gid column for medium test records
The two INSERTs into medium now supply the UUID gid, aligning test data with the updated schema. Looks correct and matches the existing release(id) references.


160-161: Add gid for media on release 2
These INSERTs for medium(id 3,4) include the new UUID column and reference a valid release=2. Good coverage of the new schema changes.


163-164: Add gid for media on release 3
Extends the test data for medium(id 5,6) with the new UUID field—consistent with the other medium entries.


245-245: New link record for redirect tests
INSERTing link(id 4, link_type 108, attribute_count 0) prepares for the new redirect logic. Please verify that link_type 108 exists in your test fixtures.


260-260: Link two new artists via l_artist_artist
The l_artist_artist row uses link = 4 and entity IDs 8,9, which are defined above. The ordering ensures the link exists before the referencing row.

mbslave/sql/CreateReplicationTriggers.sql (1)

818-820: Add replication trigger for medium_gid_redirect
This new trigger follows the pattern of other _gid_redirect tables and ensures changes in medium_gid_redirect are captured.

mbslave/sql/updates/20250425-mbs-13464-master_and_standalone.sql (2)

1-3: Ensure DDL runs in the correct schema
Setting search_path = musicbrainz upfront guarantees the ALTER statements apply to the intended schema, avoiding lookup issues for non-default users.


7-17: Enforce referential integrity on artist_release
Adding artist_release_fk_artist and artist_release_fk_release with ON DELETE CASCADE secures the link to artist(id) and release(id). Good clear constraints.

mbslave/sql/updates/20250408-mbs-13322.sql (3)

1-3: Stop on error and begin transaction
\set ON_ERROR_STOP 1 plus BEGIN; ensures the script fails fast and runs atomically.


5-13: Define delete_unused_url with FK-safe cleanup
The function deletes matching rows from url_gid_redirect and url, and only suppresses foreign‐key violations—other errors still bubble up. Logic is concise and focused.


15-15: Commit the function creation transaction
Wrapping the definition in COMMIT; finalizes the change in one atomic step.

README.rst (4)

11-11: Remove stray trailing space
Cleaned up the sentence about “replica of the MusicBrainz database.” for consistency.


39-39: Fix typo in list heading
Corrected “Alternativelly” to “Alternatively” to improve readability.


49-53: Document custom search_path requirement
Adds a clear advisory for non-default DB users with an example ALTER USER command—this directly addresses PR objectives.


93-99: Add v30 schema upgrade instructions
Outlines running 30.all.sql and updating replication_control—steps correctly match the new release and file names.

mbslave/sql/DropTriggers.sql (1)

197-198: Confirm drop of new release group type triggers

These DROP TRIGGER IF EXISTS statements correctly target the newly introduced triggers on release_group_primary_type and release_group_secondary_type, ensuring a clean slate before recreating them in the migration.

Also applies to: 391-392

mbslave/sql/CreateMirrorOnlyFunctions.sql (1)

12-12: Remove explicit search_path from mirror-only functions

Dropping redundant SET search_path clauses standardizes these functions alongside others in CreateFunctions.sql. No change to execution context is introduced.

Also applies to: 34-34, 43-43, 55-55, 70-70, 90-90, 104-104, 126-126, 134-134, 151-151, 160-160, 168-168, 185-185, 206-206, 223-223

mbslave/sql/updates/20250320-mbs-13768.sql (1)

21-29: Link medium_gid_redirect.new_id to medium.id via foreign key

The medium_gid_redirect table currently has no inline FK on new_id. As noted in the related FKs migration, confirm that new_id is constrained to medium(id) to maintain referential integrity.

mbslave/sql/updates/20250408-mbs-13964-all.sql (1)

29-32: Create trigger and truncate existing data

The trigger definition and immediate TRUNCATE recording_first_release_date correctly reset the mirror state for this migration.

mbslave/sql/updates/20250425-mbs-13966.sql (2)

38-58: Conditional update logic is sound

The UPDATE ... FROM includes an IS DISTINCT FROM filter, so only changed rows are written. This minimizes unnecessary updates.


67-67: Clear pending updates after bulk operations

Truncating artist_release_group_pending_update here prevents stale entries before triggers repopulate it. This aligns with the upgrade flow.

mbslave/sql/updates/20241125-mbs-13832.sql (1)

1-49: Explicit schema qualifiers ensure search_path independence

All referenced tables and views are fully schema-qualified (cover_art_archive.*, event_art_archive.*, musicbrainz.edit), so this migration won’t break when a user’s search_path is different. Nice work.

mbslave/sql/updates/schema-change/30.mirror_only.sql (1)

5-6: Explicit search_path set

By adding SET search_path = musicbrainz, public; you guarantee mirror-only triggers target the right schema. This aligns with best practices for multi‐schema deployments.

mbslave/sql/CreateTables.sql (3)

432-432: Include full artist name with correct collation
Adding name VARCHAR COLLATE musicbrainz NOT NULL to the artist_release partition replaces the old single-character sort field and ensures proper sorting under the MusicBrainz collation.


466-466: Maintain consistency for group name collation
The name VARCHAR COLLATE musicbrainz NOT NULL addition in artist_release_group mirrors artist_release and guarantees uniform sorting behavior.


2903-2904: Add track_count and gid to medium table
Introducing track_count INTEGER NOT NULL DEFAULT 0 and gid UUID NOT NULL meets the schema v30 requirements. The default ensures existing records aren’t NULL, and the UUID will be uniquely indexed elsewhere.

mbslave/sql/CreateTriggers.sql (3)

457-458: Add mirror update trigger for medium table
The new AFTER UPDATE trigger on medium calls a_upd_medium_mirror() to keep mirror tables in sync. Ensure the function exists and its logic matches the updated schema.


586-590: Introduce mirror update triggers for release group types
These triggers fire a_upd_release_group_primary_type_mirror() and a_upd_release_group_secondary_type_mirror() after updates, aligning the mirror tables with schema v30. Good addition.


1311-1322: Add deferrable constraint triggers for child_order changes
The constraint triggers on release_group_primary_type and release_group_secondary_type ensure pending updates run only when child_order differs. This avoids unnecessary updates.

mbslave/sql/updates/20241017-mbs-9253-master_and_standalone.sql (3)

1-5: Initialize transaction with error-stop and schema context
Using \set ON_ERROR_STOP 1 and SET search_path = musicbrainz ensures the script halts on error and works in the correct schema.


19-23: Create mirror update triggers for release_group types
These AFTER UPDATE triggers invoke mirror procedures for primary and secondary types, keeping mirror tables aligned.


25-35: Add deferrable constraint triggers on child_order changes
Deferrable triggers on release_group_primary_type/secondary_type fire only when child_order changes, optimizing update propagation.

mbslave/sql/updates/20250425-mbs-13464.sql (3)

1-4: Begin migration with error-stop
The script starts a transaction with \set ON_ERROR_STOP 1 and BEGIN;—standard practice.


5-9: Drop old function and tables in proper order
get_artist_release_rows is dropped before its underlying tables, avoiding dependency errors. Good.


22-27: Define partitions for artist_release
Sub-tables for FALSE and TRUE values complete the partition setup. Looks correct.

mbslave/sql/updates/schema-change/30.master_and_standalone.sql (6)

1-8: Initialize master/standalone migration
Setting schema search path, disabling statement timeout, and beginning the transaction ensures consistent execution across environments.


15-25: Enforce foreign keys on artist_release_group in master and standalone
Adds cascading deletes to maintain integrity across master and mirror setups.


27-32: Mirror update triggers for release_group types
Recreates triggers for primary and secondary type mirror updates in both master and standalone contexts.


33-43: Deferrable constraint triggers for child_order changes
Ensures pending updates fire only when child_order changes, across master and standalone.


51-61: Add cascading foreign keys on artist_release
Links artist_release.artistartist(id) and releaserelease(id) with ON DELETE CASCADE, aligning master and mirror schemas.


69-73: Add foreign key on medium_gid_redirect
Ensures medium_gid_redirect.new_id references medium(id), supporting the new UUID redirection mechanism.

mbslave/sql/CreateIndexes.sql (2)

510-510: Enforce uniqueness on medium.gid
A unique index on medium(gid) supports fast lookup and guarantees no duplicate UUIDs.


736-736: Index medium_gid_redirect(new_id)
This index speeds up joins and deletes on the medium_gid_redirect table, enabling efficient redirect lookups.

mbslave/sql/CreateFunctions.sql (1)

16-346: Normalize LANGUAGE specifiers and remove schema qualifications
Across all of these functions, the LANGUAGE declarations have been standardized (dropping redundant quotes around plpgsql, unifying capitalization for sql vs SQL, etc.), and the explicit SET search_path = musicbrainz, public clauses have been removed from their definitions. This cleans up the function definitions, avoids hidden schema-path dependencies, and aligns with the PR goal of moving search‐path configuration out of individual routines.

mbslave/sql/updates/schema-change/30.all.sql (1)

11-11: Explicit script‐level search_path is correct
The SET search_path = musicbrainz, public; at the top of this all-in-one migration properly ensures all subsequent unqualified references resolve to the musicbrainz schema. This aligns with the PR’s objective of externalizing search-path setup.

Version 30.0.0
==============

- Schema change 30.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Changelog detail could be enriched
Consider adding a brief note about requiring a user-level search_path change (e.g., ALTER USER foo SET search_path TO musicbrainz, public) to help DB admins avoid the function-not-found error.

🤖 Prompt for AI Agents
In CHANGELOG.rst at line 4, the changelog entry "Schema change 30." is too brief
and lacks important context. Enhance this entry by adding a note about the
necessity for a user-level search_path change, such as "ALTER USER foo SET
search_path TO musicbrainz, public", to inform database administrators about
this requirement and help prevent function-not-found errors.

Comment on lines +406 to +407
DROP INDEX medium_gid_redirect_idx_new_id;
DROP INDEX medium_idx_gid;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Drop index statements for new medium GID support
These lines clean up the two new indexes. Verify the names match those in your creation scripts. You might also consider using IF EXISTS to avoid failures during downgrades.

Run:


🏁 Script executed:

#!/bin/bash
# Verify index definitions exist
rg -n "CREATE.*medium_gid_redirect_idx_new_id" -n mbslave/sql
rg -n "CREATE.*medium_idx_gid" -n mbslave/sql

Length of output: 756


Add IF EXISTS to DROP INDEX statements for safer downgrades
Verified that both index names match their creation in:

  • mbslave/sql/CreateIndexes.sql: lines 510 & 736
  • mbslave/sql/updates/20250320-mbs-13768.sql
  • mbslave/sql/updates/schema-change/30.all.sql

Recommended fix in mbslave/sql/DropIndexes.sql (lines 406–407):

-DROP INDEX medium_gid_redirect_idx_new_id;
-DROP INDEX medium_idx_gid;
+DROP INDEX IF EXISTS medium_gid_redirect_idx_new_id;
+DROP INDEX IF EXISTS medium_idx_gid;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
DROP INDEX medium_gid_redirect_idx_new_id;
DROP INDEX medium_idx_gid;
– DROP INDEX medium_gid_redirect_idx_new_id;
– DROP INDEX medium_idx_gid;
+ DROP INDEX IF EXISTS medium_gid_redirect_idx_new_id;
+ DROP INDEX IF EXISTS medium_idx_gid;
🤖 Prompt for AI Agents
In mbslave/sql/DropIndexes.sql around lines 406 to 407, the DROP INDEX
statements lack the IF EXISTS clause, which can cause errors if the indexes do
not exist during downgrades. Modify both DROP INDEX statements to include IF
EXISTS to safely attempt dropping the indexes only if they exist, preventing
failures in downgrade scripts.

Comment on lines +1 to +9
\set ON_ERROR_STOP 1

BEGIN;

UPDATE editor
SET privs = privs | 16384 -- set new voting disabled flag
WHERE (privs & 1024) > 0; -- where editor had editing disabled flag

COMMIT;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make the migration self-contained by qualifying schema or setting search_path
The UPDATE editor statement assumes the musicbrainz schema is in the user’s search_path. If it isn’t, this will fail. To harden the script, either add after line 1:

SET LOCAL search_path = musicbrainz, public;

or fully qualify the table:

UPDATE musicbrainz.editor

This ensures the migration runs reliably under any user.

🤖 Prompt for AI Agents
In mbslave/sql/updates/20240726-mbs-9373.sql lines 1 to 9, the UPDATE statement
on the editor table assumes the musicbrainz schema is in the search_path, which
may cause failures if it is not. To fix this, either add "SET LOCAL search_path
= musicbrainz, public;" immediately after line 1 or fully qualify the table name
as "musicbrainz.editor" in the UPDATE statement to ensure the migration runs
reliably regardless of the user's search_path.

Comment on lines +42 to +43
DROP FUNCTION a_upd_release_group_primary_type_mirror();
DROP FUNCTION a_upd_release_group_secondary_type_mirror();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Optional: suppress errors with IF EXISTS on drop functions
Since ON_ERROR_STOP is unset here, missing functions won’t halt execution but will generate noise. You could use:

DROP FUNCTION IF EXISTS a_upd_release_group_primary_type_mirror();
DROP FUNCTION IF EXISTS a_upd_release_group_secondary_type_mirror();

to keep the logs clean.

🤖 Prompt for AI Agents
In mbslave/sql/DropFunctions.sql around lines 42 to 43, the DROP FUNCTION
statements do not use IF EXISTS, which causes unnecessary error messages if the
functions are missing. Modify these statements to include IF EXISTS, changing
them to DROP FUNCTION IF EXISTS a_upd_release_group_primary_type_mirror(); and
DROP FUNCTION IF EXISTS a_upd_release_group_secondary_type_mirror(); to suppress
errors and keep logs clean.

Comment on lines +6 to +21
UPDATE editor
SET privs = privs | 8192 -- set new beginner flag
WHERE id != 4 -- avoid setting ModBot as beginner
AND NOT deleted
AND (
member_since > NOW() - INTERVAL '2 weeks'
OR
NOT EXISTS (
SELECT 1
FROM edit
WHERE edit.editor = editor.id
AND edit.autoedit = 0
AND edit.status = 2
OFFSET 9
)
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Refactor EXISTS check for performance
Using OFFSET 9 without a LIMIT will scan all edits beyond the ninth. Consider adding LIMIT 1:

AND NOT EXISTS (
  SELECT 1
    FROM edit
   WHERE edit.editor = editor.id
     AND edit.autoedit = 0
     AND edit.status = 2
  OFFSET 9 LIMIT 1
)
…

This stops at the 10th row and reduces work.

🤖 Prompt for AI Agents
In mbslave/sql/updates/20240221-mbs-13492.sql around lines 6 to 21, the NOT
EXISTS subquery uses OFFSET 9 without a LIMIT, causing it to scan all rows
beyond the ninth. To improve performance, add LIMIT 1 after OFFSET 9 in the
subquery so it stops checking after the 10th row, reducing unnecessary scanning.

Comment on lines +28 to +73
CREATE OR REPLACE FUNCTION get_artist_release_rows(
release_id INTEGER
) RETURNS SETOF artist_release AS $$
BEGIN
-- PostgreSQL 12 generates a vastly more efficient plan when only
-- one release ID is passed. A condition like `r.id = any(...)`
-- can be over 200x slower, even with only one release ID in the
-- array.
RETURN QUERY EXECUTE $SQL$
SELECT DISTINCT ON (ar.artist, r.id)
ar.is_track_artist,
ar.artist,
integer_date(rfrd.year, rfrd.month, rfrd.day) AS first_release_date,
array_agg(
DISTINCT rl.catalog_number ORDER BY rl.catalog_number
) FILTER (WHERE rl.catalog_number IS NOT NULL)::TEXT[] AS catalog_numbers,
min(iso.code ORDER BY iso.code)::CHAR(2) AS country_code,
left(regexp_replace(
(CASE r.barcode WHEN '' THEN '0' ELSE r.barcode END),
'[^0-9]+', '', 'g'
), 18)::BIGINT AS barcode,
r.name,
r.id
FROM (
SELECT FALSE AS is_track_artist, racn.artist, r.id AS release
FROM release r
JOIN artist_credit_name racn ON racn.artist_credit = r.artist_credit
UNION ALL
SELECT TRUE AS is_track_artist, tacn.artist, m.release
FROM medium m
JOIN track t ON t.medium = m.id
JOIN artist_credit_name tacn ON tacn.artist_credit = t.artist_credit
) ar
JOIN release r ON r.id = ar.release
LEFT JOIN release_first_release_date rfrd ON rfrd.release = r.id
LEFT JOIN release_label rl ON rl.release = r.id
LEFT JOIN release_country rc ON rc.release = r.id
LEFT JOIN iso_3166_1 iso ON iso.area = rc.country
$SQL$ || (CASE WHEN release_id IS NULL THEN '' ELSE 'WHERE r.id = $1' END) ||
$SQL$
GROUP BY ar.is_track_artist, ar.artist, rfrd.release, r.id
ORDER BY ar.artist, r.id, ar.is_track_artist
$SQL$
USING release_id;
END;
$$ LANGUAGE plpgsql;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Review dynamic SQL in get_artist_release_rows()
The concatenation of the WHERE clause lacks a leading space, causing ...rc.countryWHERE r.id = $1. Add a space or newline before WHERE to avoid syntax errors.

Proposed fix:

-    $SQL$ || (CASE WHEN release_id IS NULL THEN '' ELSE 'WHERE r.id = $1' END) ||
+    $SQL$ || (CASE WHEN release_id IS NULL THEN '' ELSE ' WHERE r.id = $1' END) ||
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CREATE OR REPLACE FUNCTION get_artist_release_rows(
release_id INTEGER
) RETURNS SETOF artist_release AS $$
BEGIN
-- PostgreSQL 12 generates a vastly more efficient plan when only
-- one release ID is passed. A condition like `r.id = any(...)`
-- can be over 200x slower, even with only one release ID in the
-- array.
RETURN QUERY EXECUTE $SQL$
SELECT DISTINCT ON (ar.artist, r.id)
ar.is_track_artist,
ar.artist,
integer_date(rfrd.year, rfrd.month, rfrd.day) AS first_release_date,
array_agg(
DISTINCT rl.catalog_number ORDER BY rl.catalog_number
) FILTER (WHERE rl.catalog_number IS NOT NULL)::TEXT[] AS catalog_numbers,
min(iso.code ORDER BY iso.code)::CHAR(2) AS country_code,
left(regexp_replace(
(CASE r.barcode WHEN '' THEN '0' ELSE r.barcode END),
'[^0-9]+', '', 'g'
), 18)::BIGINT AS barcode,
r.name,
r.id
FROM (
SELECT FALSE AS is_track_artist, racn.artist, r.id AS release
FROM release r
JOIN artist_credit_name racn ON racn.artist_credit = r.artist_credit
UNION ALL
SELECT TRUE AS is_track_artist, tacn.artist, m.release
FROM medium m
JOIN track t ON t.medium = m.id
JOIN artist_credit_name tacn ON tacn.artist_credit = t.artist_credit
) ar
JOIN release r ON r.id = ar.release
LEFT JOIN release_first_release_date rfrd ON rfrd.release = r.id
LEFT JOIN release_label rl ON rl.release = r.id
LEFT JOIN release_country rc ON rc.release = r.id
LEFT JOIN iso_3166_1 iso ON iso.area = rc.country
$SQL$ || (CASE WHEN release_id IS NULL THEN '' ELSE 'WHERE r.id = $1' END) ||
$SQL$
GROUP BY ar.is_track_artist, ar.artist, rfrd.release, r.id
ORDER BY ar.artist, r.id, ar.is_track_artist
$SQL$
USING release_id;
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION get_artist_release_rows(
release_id INTEGER
) RETURNS SETOF artist_release AS $$
BEGIN
-- PostgreSQL 12 generates a vastly more efficient plan when only
-- one release ID is passed. A condition like `r.id = any(...)`
-- can be over 200x slower, even with only one release ID in the
-- array.
RETURN QUERY EXECUTE $SQL$
SELECT DISTINCT ON (ar.artist, r.id)
ar.is_track_artist,
ar.artist,
integer_date(rfrd.year, rfrd.month, rfrd.day) AS first_release_date,
array_agg(
DISTINCT rl.catalog_number ORDER BY rl.catalog_number
) FILTER (WHERE rl.catalog_number IS NOT NULL)::TEXT[] AS catalog_numbers,
min(iso.code ORDER BY iso.code)::CHAR(2) AS country_code,
left(regexp_replace(
(CASE r.barcode WHEN '' THEN '0' ELSE r.barcode END),
'[^0-9]+', '', 'g'
), 18)::BIGINT AS barcode,
r.name,
r.id
FROM (
SELECT FALSE AS is_track_artist, racn.artist, r.id AS release
FROM release r
JOIN artist_credit_name racn ON racn.artist_credit = r.artist_credit
UNION ALL
SELECT TRUE AS is_track_artist, tacn.artist, m.release
FROM medium m
JOIN track t ON t.medium = m.id
JOIN artist_credit_name tacn ON tacn.artist_credit = t.artist_credit
) ar
JOIN release r ON r.id = ar.release
LEFT JOIN release_first_release_date rfrd ON rfrd.release = r.id
LEFT JOIN release_label rl ON rl.release = r.id
LEFT JOIN release_country rc ON rc.release = r.id
LEFT JOIN iso_3166_1 iso ON iso.area = rc.country
$SQL$ || (CASE WHEN release_id IS NULL THEN '' ELSE ' WHERE r.id = $1' END) ||
$SQL$
GROUP BY ar.is_track_artist, ar.artist, rfrd.release, r.id
ORDER BY ar.artist, r.id, ar.is_track_artist
$SQL$
USING release_id;
END;
$$ LANGUAGE plpgsql;
🤖 Prompt for AI Agents
In mbslave/sql/updates/20250425-mbs-13464.sql around lines 28 to 73, the dynamic
SQL concatenation for the WHERE clause in the get_artist_release_rows() function
lacks a leading space before 'WHERE', causing a syntax error by merging it with
the previous token. Fix this by adding a space or newline before 'WHERE' in the
concatenated string to ensure proper SQL syntax.

Comment on lines +72 to +73
CREATE INDEX artist_release_nonva_idx_sort ON artist_release_nonva (artist, first_release_date NULLS LAST, catalog_numbers NULLS LAST, country_code NULLS LAST, barcode NULLS LAST, name, release);
CREATE INDEX artist_release_va_idx_sort ON artist_release_va (artist, first_release_date NULLS LAST, catalog_numbers NULLS LAST, country_code NULLS LAST, barcode NULLS LAST, name, release);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add sort indexes for partitioned artist_release tables
Indexes on (artist, first_release_date, catalog_numbers, country_code, barcode, name, release) optimize queries that aggregate per artist. Consider CONCURRENTLY in production to avoid long locks.

🤖 Prompt for AI Agents
In mbslave/sql/CreateIndexes.sql around lines 72 to 73, add indexes on the
columns (artist, first_release_date, catalog_numbers, country_code, barcode,
name, release) for the partitioned tables artist_release_nonva and
artist_release_va to optimize artist-based aggregation queries. When applying
these indexes in production, use the CONCURRENTLY option to prevent long locking
of the tables during index creation.

Comment on lines +80 to +81
CREATE INDEX artist_release_group_nonva_idx_sort ON artist_release_group_nonva (artist, unofficial, primary_type_child_order NULLS FIRST, primary_type NULLS FIRST, secondary_type_child_orders NULLS FIRST, secondary_types NULLS FIRST, first_release_date NULLS LAST, name, release_group);
CREATE INDEX artist_release_group_va_idx_sort ON artist_release_group_va (artist, unofficial, primary_type_child_order NULLS FIRST, primary_type NULLS FIRST, secondary_type_child_orders NULLS FIRST, secondary_types NULLS FIRST, first_release_date NULLS LAST, name, release_group);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add sort indexes for partitioned artist_release_group tables
Similar composite indexes on release_group partitions help performance for group-based queries. Consider building these indexes concurrently.

🤖 Prompt for AI Agents
In mbslave/sql/CreateIndexes.sql around lines 80 to 81, the new composite
indexes on the partitioned artist_release_group tables should be created
concurrently to avoid locking issues during index creation. Modify the CREATE
INDEX statements to include the CONCURRENTLY keyword, ensuring the indexes are
built without blocking writes on the tables.


CREATE OR REPLACE FUNCTION controlled_for_whitespace(TEXT) RETURNS boolean AS $$
SELECT NOT padded_by_whitespace($1);
$$ LANGUAGE sql IMMUTABLE SET search_path = musicbrainz, public;
$$ LANGUAGE SQL IMMUTABLE SET search_path = musicbrainz, public;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Drop remaining SET search_path from function
The controlled_for_whitespace function still embeds a SET search_path = musicbrainz, public in its LANGUAGE clause. To fully eliminate schema search‐path assumptions within function bodies, please remove this directive and rely on the script-level or user-level setting instead.

🤖 Prompt for AI Agents
In mbslave/sql/CreateFunctions.sql at line 1592, remove the `SET search_path =
musicbrainz, public` directive from the LANGUAGE clause of the
`controlled_for_whitespace` function to avoid embedding schema search-path
assumptions inside the function. Rely on script-level or user-level search_path
settings instead.

Comment on lines +1 to +4
\set ON_ERROR_STOP 1

BEGIN;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add explicit search_path at script start
This migration defines and drops functions and tables using unqualified names but doesn’t set a search_path. If run under a nondefault role, objects may end up in the wrong schema or fail to resolve. Please add a SET search_path = musicbrainz, public; immediately after BEGIN; to guarantee the script operates in the intended schema.

🤖 Prompt for AI Agents
In mbslave/sql/updates/20241017-mbs-9253-13464.sql at lines 1 to 4, the script
lacks an explicit search_path setting, which can cause schema resolution issues.
After the BEGIN; statement, add the line SET search_path = musicbrainz, public;
to ensure all unqualified object names are resolved in the correct schema during
the migration.

@nikammerlaan
Copy link

This PR does seem to be missing something. I setup a fresh database with this branch and then attempted to sync up to latest. I ran into this error:

INFO:mbslave.replication:Downloading https://metabrainz.org/api/musicbrainz/replication-177992-v2.tar.bz2?token=***
INFO:mbslave.replication:Packet was produced at 2025-06-28 01:06:34.456202+00
Traceback (most recent call last):
  File "/opt/pipx_bin/mbslave", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/pipx/venvs/mbslave/lib/python3.12/site-packages/mbslave/replication.py", line 806, in main
    args.func(config, args)
  File "/opt/pipx/venvs/mbslave/lib/python3.12/site-packages/mbslave/replication.py", line 516, in mbslave_sync_main
    process_tar(packet, db, config, ignored_schemas, ignored_tables, schema_seq, replication_seq, hook)
  File "/opt/pipx/venvs/mbslave/lib/python3.12/site-packages/mbslave/replication.py", line 463, in process_tar
    importer.process()
  File "/opt/pipx/venvs/mbslave/lib/python3.12/site-packages/mbslave/replication.py", line 423, in process
    cursor.execute(sql, params)
psycopg2.errors.UndefinedTable: relation "artist_release_group_pending_update" does not exist
LINE 1: INSERT INTO artist_release_group_pending_update VALUES (OLD....
                    ^
QUERY:  INSERT INTO artist_release_group_pending_update VALUES (OLD.id)
CONTEXT:  PL/pgSQL function musicbrainz_new.a_upd_release_group_mirror() line 8 at SQL statement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MusicBrainz database schema change release, 2025-05-19
2 participants
0