You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when the WAL snapshots it deletes all files that were snapshotted. For downstream Enterprise replicas, we'll want to keep some WAL files around so they can pick them up even if lagging behind a bit.
Add a configuration option for the number of snapshotted WAL files to leave behind. The default should be 300 (at least 5m of data with default settings).
With #25787 merged in, the restart/replay process will now ignore previously snapshotted files.
We need to update replay and the snapshot process to not automatically delete all snapshotted files. Instead, it should keep a WAL file number of the oldest WAL file. When a snapshot comes through you can compare that number - the oldest to determine how many snapshotted files exist.
So we have:
keep-snapshotted-wal-count
oldest_wal_number
latest_wal_number (the most recent file written)
last_snapshot_number
Where oldest < last < latest always. The number of snapshotted WAL files we have kept on object store is last - oldest. We want to delete from oldest to N so that we have the keep number.
During replay you can ignore deletions completely, as long as the WAL is initialized with the 4 numbers we need.
One important bit about the restart is that we don't want to actually load all the WAL files between oldest and last_snapshot since we don't need that data. That means the startup process should first look for the latest persisted snapshot to get last_snapshot number.
Then we'll need to do as many object store LIST operations on the WAL directory to get the full range of files there. We only need to know oldest and latest. Now we have our 4 numbers and we can load up all the WAL files from last_snapshot to latest to load into the QueryableBuffer.
The text was updated successfully, but these errors were encountered:
This commit allows a configurable number of wal files to be left behind
in OS. This is necessary as enterprise replicas rely on these files.
closes: #25788
This commit allows a configurable number of wal files to be left behind
in OS. This is necessary as enterprise replicas rely on these files.
closes: #25788
* feat: introduce num wal files to keep
This commit allows a configurable number of wal files to be left behind
in OS. This is necessary as enterprise replicas rely on these files.
closes: #25788
* refactor: address PR feedback
* refactor: address PR comment
Currently, when the WAL snapshots it deletes all files that were snapshotted. For downstream Enterprise replicas, we'll want to keep some WAL files around so they can pick them up even if lagging behind a bit.
Add a configuration option for the number of snapshotted WAL files to leave behind. The default should be 300 (at least 5m of data with default settings).
With #25787 merged in, the restart/replay process will now ignore previously snapshotted files.
We need to update replay and the snapshot process to not automatically delete all snapshotted files. Instead, it should keep a WAL file number of the oldest WAL file. When a snapshot comes through you can compare that number - the oldest to determine how many snapshotted files exist.
So we have:
Where oldest < last < latest always. The number of snapshotted WAL files we have kept on object store is last - oldest. We want to delete from oldest to N so that we have the keep number.
During replay you can ignore deletions completely, as long as the WAL is initialized with the 4 numbers we need.
One important bit about the restart is that we don't want to actually load all the WAL files between oldest and last_snapshot since we don't need that data. That means the startup process should first look for the latest persisted snapshot to get last_snapshot number.
Then we'll need to do as many object store LIST operations on the WAL directory to get the full range of files there. We only need to know oldest and latest. Now we have our 4 numbers and we can load up all the WAL files from last_snapshot to latest to load into the QueryableBuffer.
The text was updated successfully, but these errors were encountered: