-
Notifications
You must be signed in to change notification settings - Fork 298
Object Design And Schema
These are the primary classes and SQL tables used by Couchbase Lite. I’m describing them in language-neutral terms since I expect there to be multiple implementations.
The heart of Couchbase Lite is the Database class. On disk flash, it consists of a SQLite database file and an associated directory containing attachments. In memory it has:
- a database connection handle
- a set of View objects representing rows in the Views table
- a set of Replicator objects representing active replication tasks
This table stores document ID strings so they can be represented more compactly as foreign keys in the “revs” table.
Column | Type | Description |
doc_id | integer | Primary key |
docid | text | Document ID string |
Each row in this table is a revision of a document. It’s used to model the sequence of updates, so that replication can proceed from any point when it connects to a peer.
Column | Type | Description |
sequence | integer | Sequence number (this is the primary key; it is set to auto-increment without reusing any values) |
doc_id | integer | Document ID (foreign key) |
revid | text | Revision ID string |
parent | integer | Parent revision’s sequence number, or null if no parent (foreign key) |
current | boolean | Is this a current (leaf) revision? |
deleted | boolean | Does this revision represent a deletion? |
json | blob | Document contents in UTF-8 encoded JSON |
Note: To save space, the JSON does not include the `id`, `rev`, `deleted` or `attachments` properties; those are added when the JSON is returned from the API.
Tracks attachments of revisions and their keys in the content-addressable BlobStore.
Column | Type | Description |
sequence | integer | Revision that owns this attachment (foreign key) |
filename | text | Filename of the attachment |
key | blob | Contents’ key in attachment store (SHA-1 digest of contents) |
type | text | MIME type |
length | integer | Content length in bytes |
encoding | integer | Type of encoding/compression (0 for none, 1 for gzip) |
encoded_length | integer | Length of encoded data, if there’s an encoding |
revpos | integer | Generation number (numeric revision prefix) where this attachment was added or changed |
Every ‘revs’ row has associated ‘attachments’ rows for every attachment it contains, not just for attachments added or modified in that revision. This does mean a lot of duplicate ‘attachments’ rows, but it makes attachment lookup faster, and compaction easier.
Each row in this table is a view definition.
Column | Type | Description |
view_id | integer | Primary key |
name | text | Name of view (unique) |
version | text | Version ID of view definition function; must be changed if the function’s semantics change |
lastsequence | integer | The last sequence number in “docs” that has been indexed by this view (foreign key) |
View definitions are not stored in the database as source code. They are native functions, represented by function pointers or their equivalent. The client must register each function with its named view when the database is opened.
Each row in this table is a key/value pair emitted by a view’s map function.
Column | Type | Description |
view_id | integer | View that emitted this row (foreign key) |
sequence | integer | Revision that emitted this row (foreign key) |
key | text | JSON-encoded emitted key |
value | text | JSON-encoded emitted value |
Stores persistent state of replications to/from other databases. The Replicator class uses this.
Column | Type | Description |
remote | text | URL of remote database |
push | boolean | Is this a “push” replication, i.e. is ‘remote’ the destination? |
last_sequence | text | Last sequence processed from the source database (which may or may not be local.) |
Stores local documents. These are not replicated, don’t show up in views, and don’t store previous revisions. They are distinguished by having a document ID prefixed with “_local/”. Their main defined purpose is to store state information for replications.
Column | Type | Description |
docid | text | Document ID (primary key) |
revid | text | Current revision ID |
json | blob | JSON contents |
A table that stores some persistent per-database information.
Column | Type | Description |
key | text | Property name (primary key) |
value | text | Property value |
Currently defined keys are “privateUUID” and “publicUUID”, each of which has a value that’s a randomly generated string. These are used to uniquely identify the source and target databases during replication.
The View class is closely tied to the Database. It’s just broken out to give each view a place to store transient data (most importantly the map function pointer) and to make the API and implementation a bit clearer. Each View instance is associated with a row in the “views” table.
Instead of keeping a separate B-tree index for every view, Couchbase Lite has a single “maps” table. It contains a row for every key/value pair that was emitted by a map function of any view. There is no storage of intermediate results from the reduce function, though (at least not yet.)
Before a query, the View object compares its saved last_sequence value against the highest sequence number in the ‘revs’ table. If they don’t match, it needs to rebuild the index. To do this it first deletes map rows emitted by obsolete revisions (ones that appear as ‘parent’ values in revs added since last_sequence). Then it iterates over every rev since last_sequence, calls the map function on it, and adds any emitted key/value pairs to ‘maps’. Finally it updates its last_sequence.
The BlobStore stores attachments for a database. It implements a simple content-addressable store of arbitrary-sized blobs of data. A blob is given a unique key that’s its SHA-1 digest, saved to a file named after the key, and then referred to in the database by its key. After the database is compacted, all blob files whose keys no longer appear in the database are deleted.
The DatabaseManager object is fairly simple — it represents the collection of named databases owned by the server. It stores:
- A reference to its root directory (which contains the database files)
- A dictionary mapping names to Database objects
- A ReplicatorManager
The Server sits atop a DatabaseManager and provides thread-safety. It creates a single background thread for Couchbase Lite to run on, and its public API lets the client submit tasks that are queued to run one at a time on that background thread. (In the Objective-C implementation these are given as blocks.)
Replicator is an abstract class representing an active replication. Its concrete subclasses are Pusher and Puller. Its properties are:
- the local database object
- the remote database URL
- a flag indicating whether the replication is continuous
- the last revision sequence number/ID transferred (persisted in both the local and remote databases)
- ReplicatorManager
The ReplicatorManager is a (per-server) singleton that manages persistent replications. It watches the special database named `_replicator` and maps every document in it to a runtime instance of Replicator. As documents are created and updated it updates the replicators, and as replicator state changes it updates the documents.
Router implements the REST API. An instance is responsible for handling a single request — it’s given the details of an HTTP request as a platform-specific object (e.g. a Cocoa NSURLRequest), interprets the method and path to determine what operation to perform, and then calls a method to perform that operation. The end result is a Response object containing the HTTP status code, headers and body.
Router doesn’t implement an HTTP server. It’s more like a servlet, taking a pre-parsed request and interpreting it. There are platform-specific higher layers (such as `CBLURLProtocol` for Cocoa) that glue Router instances into HTTP infrastructure.
A Revision is a passive value object that bundles together the data of a single revision of a document. It has an immutable document ID and revision ID. It can also have a sequence number and a JSON body, which can be set after the object is created.
The body is abstracted as a Body class, which internally maintains two different representations: raw JSON data, and a pre-parsed object hierarchy. It can be instantiated with either form, and will transparently provide either form when asked, doing the JSON parsing or generation on the fly. This can help avoid unnecessary conversions: for example, when a document’s body is fetched it comes out of the database as JSON data. If there’s no need to manipulate the document before returning it, it can be stuffed directly into the HTTP response as data without having to parse it. But if it does need to be translated to a dictionary (e.g. for a multiple-document request) the Body object will do it.