Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make index creation idempotent #2197

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

josephschorr
Copy link
Member

  1. Make each non-transactional migration step contain only a single migration operation. This ensures that if the operation fails, we don't end up "halfway" into a migration step. In the future, we'll enforce this via testing.
  2. Change how concurrent indexes are created. See: https://www.shayon.dev/post/2024/225/stop-relying-on-if-not-exists-for-concurrent-index-creation-in-postgresql/ for why this is important

@github-actions github-actions bot added the area/datastore Affects the storage system label Jan 8, 2025
Copy link
Contributor

@vroldanbet vroldanbet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach seems sensible, although we cannot enforce migration designs like this moving forward.

I don't see why this has to be done in different migrations if the column creation is also designed to be idempotent (e.g., checking the catalog to see if the column was already created). Ideally, we have self-contained migrations, but I don't strongly oppose doing this—it would also work.

Example of checking if column exists:

DO $$
BEGIN
    -- Add column if it doesn't exist
    IF NOT EXISTS (
        SELECT 1 
        FROM information_schema.columns 
        WHERE table_name = 'your_table' 
        AND column_name = 'new_column'
    ) THEN
        EXECUTE 'ALTER TABLE your_table ADD COLUMN new_column type';
    END IF;
END $$;

I also wondered why we weren't running those in the same transaction (supported via the migration framework), but it's impossible to run CREATE INDEX CONCURRENTLY transactionally with another statement.

There is a bunch of caveats with CREATE INDEX CONCURRENTLY.

One issue I see is that the idempotent drop/create index can enter an unrecoverable (and potentially disruptive by sustaining high-load for an extended period?) state if timeouts (migration timeout, statement timeout) are hit.

  1. migration runs
  2. drop index
  3. create index
  4. index creation fails due to SQL statement timeout or migration timeout, marked as invalid
  5. migration terminates with an error
  6. GOTO 1 (operator retries migration)

Coding defensively would be complex, so I suggest we log the error and provide some advice in the log (e.g., Please consider increasing your statement timeout and migration timeout using flags...) and let a human intervene.

I also did some research, and there is no scenario in which a session terminates and index creation continues, so we are guaranteed that it will always end up in an invalid state if it has not succeeded (it will never run detached).

@vroldanbet vroldanbet changed the title Change the postgres migrations in two ways: make index creation idempotent Jan 9, 2025
@josephschorr
Copy link
Member Author

Updated as discussed

@josephschorr josephschorr marked this pull request as ready for review January 10, 2025 22:07
@josephschorr josephschorr requested a review from a team as a code owner January 10, 2025 22:07
1) Make each non-transactional migration step contain only a single migration operation. This ensures that if the operation fails, we don't end up "halfway" into a migration step. In the future, we'll enforce this via testing.
2) Change how concurrent indexes are created. See: https://www.shayon.dev/post/2024/225/stop-relying-on-if-not-exists-for-concurrent-index-creation-in-postgresql/ for why this is important
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datastore Affects the storage system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants