Better support for gradual migrations: better tooling for CREATE INDEX CONCURRENTLY

Say one wants to add a new column to a table and populate it gradually, and they want to index it but not pay the price of locking the table due to CREATE INDEX in their next deployment. Because of that, they want to use CREATE INDEX CONCURRENTLY.

But they don't want to put either of those statements (CONCURRENTLY or otherwise) in a codd migration because the table is large so it could take long, keeping the application down longer than necessary during the deployment.

How would one get the best of all worlds in this case? Currently, it would have to be something along these lines:

1 - One PR with ALTER TABLE .. ADD COLUMN .. and extra code to gradually populate the new column.
2 - Someone runs CREATE INDEX CONCURRENTLY manually to the environment where this is deployed, but only after deployment of the PR above.
3 - Another PR with CREATE INDEX CONCURRENTLY added as a dev-only migration gets merged immediately after step 2 is done. This will make schemas match before future deployments.

There are multiple issues with the strategy above. The main one is that this won't work if there are multiple environments with different deployment schedules, like Staging and Production. It would work for Staging, but Production would face a schema mismatch because the index will be in the disk representations but not in the pending migrations.

Even for a single environment, the need for lockstep actions and manual application of SQL is unacceptable.

We need to think of tooling to avoid manual interventions and still get all the positives above in a single PR.

One very complicated idea would be to have a special -- codd: no-txn-apply-last-with: 2000-01-01-some-previous-migration.sql top-level directive that will make the migration with this directive:

Be applied when 2000-01-01-some-previous-migration.sql is applied, but after every other pending migration and outside a transaction.
Have codd signal that all other "regular" migrations have been applied, by e.g. a LISTENER mechanism. This is necessary so that e.g. a load balancer knows to start serving requests before waiting for that last migration to finish. Let's see if that would work through an example:

Migration 1: 2020-01-01-add-col.sql

Alter table sometbl add column new_col text;

Migration 2:

-- codd: no-txn-apply-last-with: 2000-01-01-add-col.sql

Create index concurrently ... (new_col)

These would be added in a single PR. But what about schema comparison? Since we want the newly deployed app to start serving requests before this last special migration is applied, we need to check schemas then. But we definitely want to compare schemas after the special migration, too. Should we store both?

I don't know yet. One idea is to have another special directive as such:

-- codd: schema-diff-begin
(here comes a multi-line json diff of the schema changes this migration introduces)
-- codd: schema-diff-end

This would allow codd to build both schemas, and could be automatically added by codd add, so it could work nicely.

But would it work for other gradual migration scenarios? What if other PRs are merged with schema changes at some point; would that break the future?

To avoid even thinking about that, we could modify on-disk representations so that every field has both the before-value and the after-value for each such special migration. But we'd want to remove those from on-disk representations at some point, which sounds terrible. So scrap that idea.

Back to storing diffs in migrations themselves, then.

Say another developer carelessly adds a migration with ALTER INDEX new_index_created_in_special_migration RENAME TO some_other_name.

This migration cannot be deployed with the previous special migration, as there'd be no index to rename.
Locally a developer might not catch this if they run codd up, then create this new migration, then run codd up again.

However, a fresh codd up in CI would catch it. The problem is.. a fresh codd up in CI will become a problem at some point, since eventually devs should have the capability of modifying that index.

That the migration that renames the index created in that special migration needs to run after the special migration is inevitable. And given the special migration runs last among pending migrations, the only thing we can do is have yet another top-level directive to indicate that this migration depends on the special migration being applied.

It might be possible for codd add to detect that this migration changes the same parts of on-disk reps that the diff in the special migration does, and then add the dependency automatically. That could be -- codd: run-after 2020-01-02-create-idx-concurrently.sql. But wouldn't codd have to look into applied migrations to find that? Yes, it probably would. And further migrations would added afterwards would have to see that a migration with run-after already exists to avoid adding it again.

So much complexity. And it wouldn't even fully work! It is possible a special migration creates two indexes, in which case it should be possible for two migrations to have a -- codd: run-after ..., one per created index, if the user so desires.

It seems there are too many possibilities. It would be better if we constrain the problem. Let's change the names of things so they're not as generic.

Let's rename -- codd: no-txn-apply-last-with: 2000-01-01-add-col.sql to -- codd: begin-concurrent-migration-with: 2000-01-01-add-col.sql and let's rename -- codd: run-after ... to -- codd: end-concurrent-migration ....

We can force for these migrations to come in triples, and no more. Then, codd up will look for pending migrations and:

If only the first two are present, apply the first in a batch, signal deployment is over, then apply the concurrent migration
- If there are more deployments before the third migration is added, they should have schemas with the created index due to developers' machines having that applied, and everything should proceed as usual
- If developers add migrations that require the changes from the begin-concurrent-migration-with... migration a fresh codd up would barf, and this might be good enough: we can strongly recommend devs run codd up from a fresh DB in CI to catch this, and many other things, too. But codd should definitely print out nice errors here, e.g. "I found a migration with begin-concurrent-migration-with.... These migrations are complicated to handle because they're applied after every other migration unless another very special migration exists to say otherwise. Might this be what's going wrong here? Read ... for more info".
- What if developers add a migration with an earlier timestamp that interferes with these special migrations? codd up would also catch this either though a sql error or a schema mismatch. Even if the user merges master in and runs a fresh codd up && codd write-schema locally before committing, codd should detect locally (and in CI) that the schemas before the special migration does not match the on-disk schema minus the schema diffs in the migration.
  - What if the developer then changes the schema diff inside the existing migration? We put a comment in that schema diff saying "do not modify!" and blame them if they do it. Of course, we add a nice error message on codd up suggesting "Schema diffs before and after the special migration do not match. Have you added a migration with a timestamp before the special migration that possibly interferes with it? If so, try to bump your migration's timestamp".
If only the third is present this is fine and it gets applied as a normal migration alongside all others.
If all three are present, include all in the batch, apply them in normal order alongside every other pending migration and signal deployment only after all are applied. This will not have the desired effect of a gradual migration but is the only way to act consistently and safely, since the user is explicitly telling us to apply every pending migration, and all three migrations are in there.

Todo: continue thinking and writing about this feature (WIP).

Other statements user may want to run "outside" the main batch of pending migrations:

REFRESH MATERIALIZED VIEW CONCURRENTLY
VACUUM
REINDEX CONCURRENTLY

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Other statements user may want to run "outside" the main batch of pending migrations:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Other statements user may want to run "outside" the main batch of pending migrations:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions