Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

psql/migrations: when no-transaction is set, split multi-statement sql into individual statements and execute them separately #3574

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

yannleretaille
Copy link

This is going to be a controversial one.

Recently, support for adding -- no-transaction at the beginning of migration files was introduced, which will execute the migration without creating a transaction. This is useful for certain commands, such as CREATE INDEX CONCURRENTLY, which are not allowed to run inside transactions.

However, postgres will implicitly create a transaction block when a query contains multiple statements. This means that the -- no-transaction flag has inconsistent behavior depending on whether the migration contains just one statement or multiple ones, and that commands that are not allowed to run inside transactions will fail if part of a larger migration.

I propose to split the sql into individual statements when no_tx is set and to execute them separately, resulting in consistent behavior in line with how many sql tools handle this case. This was implemented using a custom pest grammar, which is very low overhead and performant (but does add a dependency).

One could argue that the current behavior is good (albeit lacking a clear error message) and that users should be encouraged to create small migrations with only one statement at a time when opting out of the safety of transactions. I would counter that a user that opts out of transactions usually has a reason for that and is aware that extra care has to be taken to ensure that migration is sound and tested, that an error during a no_tx migration will always require manual cleanup anyway, and that there are production cases where forcing the user to split up a logical migration into potentially many individual files can add complexity and make it harder to manage and audit (we recently had such case).

@abonander
Copy link
Collaborator

I don't really see the value in carrying all this extra complexity for what is a pretty niche use-case.

As far as I can tell, Diesel's migrations don't work this way, nor does Flyway--the two migrators I'm familiar with--and both allow you to configure migrations to run outside of a transaction.

@yannleretaille
Copy link
Author

Flyway actually does split migrations into individual statements, and even goes a step further and automatically detects if certain statements need to be run outside a transaction.

Another example is goose (Go, 7k stars) that also splits the sql into statements and even allows providing additional annotations per statement.

On the ruby/rails side, ActiveRecord supports setting disable_ddl_transaction! and providing multiple execute statements to achieve the same effect.

Most SQL tools (DBeaver, psql, etc.) will also split multi-statment sqls and run them sequentially.

I do agree about the added complexity, but I'm not sure if I would necessarily agree that it is a niche case when dealing with larger projects that use postgres.

@abonander
Copy link
Collaborator

abonander commented Oct 24, 2024

I dunno, I think it's a huge footgun because you can easily be left stuck with a partially applied migration. It won't even catch syntax errors until it goes to execute that statement. And if there's any bugs in the parser it could lead to some really confusing errors.

I think keeping our migrations simple is an advantage.

As a compromise, I could see adding a simple comment syntax like:

-- no-transaction

ALTER TABLE foo ADD COLUMN bar_id uuid;

-- Execute the following chunk in its own `Query` message:
-- statement-break
CREATE INDEX CONCURRENTLY ON foo(bar_id, foo_id);
-- statement-break

ALTER TABLE foo ADD FOREIGN KEY bar_id REFERENCES bar(bar_id);

Then instead of having to pseudo-parse SQL, we just check if a line begins with a certain string, similar to how we detect -- no-transaction currently.

It would also give you direct control over the splitting, and group things together which can execute in an autocommit block for a bit more safety.

@yannleretaille
Copy link
Author

yannleretaille commented Oct 24, 2024

Re: footguns
Opting out of transactions will always come with risks. I think making that clear in the documentation (I don't think it's fully documented yet anyway since it's a new feature) and improving the error messaging would be helpful additional steps. It might even be a good idea to rename it to -- unsafe-no-transaction (while still accepting the old version).

Re: parser
I did test the parser with a bunch of exotic cases (testcases-sql.txt) and couldn't find any issues with it. I will add extensive testing in case you greenlight this approach. The pest parser is very lightweight and fast, but it does end up being a "pseudo" parser as you correctly point out. I think that's OK if all that is needed is to split statements and given a lot of other libraries and tools parse sql for the purpose of splitting it seems to be an accepted approach. I did briefly consider using pg_query, which should "guarantee" correct splitting, but decided against it due to the very heavy dependencies and long compile times.

Re: comment annotation idea
Your idea of using comments to annotate the beginning of a new statement block in --no-transaction migrations sounds like a potentially good approach as well. It's definitely safer/easier to parse, and the explicit annotation would make it very clear what the migration is doing. However, I'm not totally convinced about the added safety benefit, as any failure in a multi-statement no_tx migration will always require manual intervention. That also means that the additional annotation is ultimately slightly redundant.

Please let me know if you think any of the two approaches would be acceptable to you and if yes, which one you'd prefer and I'll update the PR!

@abonander
Copy link
Collaborator

@yannleretaille I strongly prefer --statement-break because the parsing is simpler, doesn't really need to be syntax-aware, and the behavior is explicitly opt-in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants