Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wal checkpoint support #696

Open
2 of 6 tasks
sonhmai opened this issue Jan 15, 2025 · 4 comments
Open
2 of 6 tasks

Wal checkpoint support #696

sonhmai opened this issue Jan 15, 2025 · 4 comments

Comments

@sonhmai
Copy link
Contributor

sonhmai commented Jan 15, 2025

Hi folks,

I'd like to create this issue as an umbrella issue for adding WAL checkpoint support. See sqlite doc for wal checkpointing

Rational

  1. important feature for write ahead log to support recovery
  2. needed in benchmark(s) like mobibench like Mobibench mentioned in Add support for sqlite3_wal_checkpoint_*() #478

todos

Context

  • checkpoint is applying changes in WAL file to database file in wal journal_mode.
  • it is triggered
    • manually by pragma wal_checkpoint and sqlite3_wal_checkpoint*
    • automatically by the database when wal file is over a threshold

Feel free to comment, I will add the missing ones, or add more details.

@sonhmai
Copy link
Contributor Author

sonhmai commented Jan 19, 2025

Support for pragma wal_checkpoint; added in #694

@sonhmai
Copy link
Contributor Author

sonhmai commented Jan 19, 2025

checkpoint infra already across the modules in wal, pager, connection.

wal module

fn checkpoint(

connection

pub fn checkpoint(&self) -> Result<()> {

It seems that we need to two other things as part of checkpoint result later. Ref this todo.

  1. num modified pages written to wal file
  2. num pages moved to db after checkpoint

@sonhmai sonhmai changed the title add wal checkpoint support Wal checkpoint support Jan 19, 2025
penberg added a commit that referenced this issue Jan 20, 2025
Wire pragma wal_checkpoint to checkpoint infra
- add basic support for parsing and instruction emitting `pragma
wal_checkpoint;`
- checkpoint opcode for instruction
- checkpoint execution in `virtual machine`
- cli test
Part of #696.
Before
```
limbo> pragma wal_checkpoint;

  × Parse error: Not a valid pragma name
```
After
```
Enter ".help" for usage hints.
limbo> pragma wal_checkpoint;
0|0|0
```
```

Closes #694
@sonhmai
Copy link
Contributor Author

sonhmai commented Jan 23, 2025

I'm curious why limbo decides to only support wal mode. While SQLite has also rollback re journaling mode?
Can someone help explain?

Refs

  1. mentioned by @pereman2 here Centralize Rust integration and regression tests #753 (review)
  2. // One difference between SQLite and limbo is that we will never support multi process, meaning

@LtdJorge
Copy link
Contributor

The only benefit of journal vs WAL that I can think of is that journal mode lets you put the journal and database files in a network filesystem and access the DB from multiple clients over the network, while with WAL you have required data in memory that other networked clients cannot access.

From how WAL works:

The traditional rollback journal works by writing a copy of the original unchanged database content into a separate rollback journal file and then writing changes directly into the database file. In the event of a crash or ROLLBACK, the original content contained in the rollback journal is played back into the database file to revert the database file to its original state. The COMMIT occurs when the rollback journal is deleted.

The WAL approach inverts this. The original content is preserved in the database file and the changes are appended into a separate WAL file. A COMMIT occurs when a special record indicating a commit is appended to the WAL. Thus a COMMIT can happen without ever writing to the original database, which allows readers to continue operating from the original unaltered database while changes are simultaneously being committed into the WAL. Multiple transactions can be appended to the end of a single WAL file.

D. Richard Hipp on why rollback journal can be used with network FS while WAL cannot:

https://www.youtube.com/watch?v=gpxnbly9bz4&t=2010s

Basically, the rollback journal keeps the original pages from the database file, and the changes are made to the database file itself. In the event of a rollback, the pages from the rollback journal are written back to the database file. In WAL, since the changes are made to the WAL and not the database file, concurrent readers need to know if a page is in the database file or in the WAL file, and the mapping to current pages exists in a hash table in memory, thus only local clients can read its content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants