Function source normalization (pg_dump)

The Codd representation of a PostgreSQL function/procedure includes the MD5 hash of the source code of the function body when the function is implemented in SQL or PL/pgSQL (reference). This function body source code is a string that includes the exact whitespace used during creation. This whitespace is collapsed when the database is dumped using pg_dump, however, causing a difference in representation.

This issue can be avoided by not using pg_dump schema dumps. For example, dumps are often used when upgrading PostgreSQL. An alternative is to initialize the schema on the new version from the migrations (using Codd) and to only use pg_dump to migrate the data. Such schema dumps are the de facto standard way to save a snapshot of a database (schema), however, so being able to compare them would be nice.

This issue could be resolved by normalizing the function body source code. This requires parsing the source code. (Doing this correctly requires parsing of nested dollar-quoted strings using identifiers, using an FSM augmented with an identifier stack.) I have not tried implementing this, but my current idea of how to do so it to change the representation to include the full source code. An option could then be used to determine how that source code is compared (equality by default, normalized when requested). Would including the full source code in the representation be problematic?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions