Skip to content

Database Design

Bohdan Khorolets edited this page Sep 12, 2023 · 1 revision

Quick recap

We use a few different data sources:

  • ScyllaDB
  • AWS S3
  • NEAR JSON-RPC

The main source of data for the ReadRPC project is ScyllaDB. However, sometimes, the data size is too big to duplicate in ScyllaDB when the data is on AWS S3. Sometimes, we store a “map” of the AWS S3 data in the ScyllaDB to know exactly where to look.

Thus, the design of the ScyllaDB tables follows the approach for the most efficient way to read the data and return the result to a ReadRPC requester.

Data in ScyllaDB is filled with dedicated indexers built on top of the NEAR Lake Framework (Rust):

  • state-indexer - everything account state related + metadata for blocks/chunks
  • tx-indexer - collecting “full” transactions (Transaction itself, related Receipts and ExecutionOutcomes) for faster reading

State

A contract state can be one of these types:

  • Data - contract state data
  • AccessKey_ - Access Keys added to an account
  • Contract code - contract code itself
  • Account - account-related data like balance

Since the read use cases for all these types are separate, we split them into separate tables to avoid unnecessary data filtering (following the recommended approach of ScyllaDB).

StateChangeWithCauseView structure from near-primitives::views is used.

While I haven’t created a doc where a reader can learn more about StateChangesWithCauseView, I’ll try to explain in a few words here.

This structure represents all the changes to the state happening in the particular Block. If a contract call happens and, as a result, its state has changed, it will be reflected in the StateChangeWithCauseView for the block where it happened.

For example, if someone has created a post in NEAR Social, it means that the value of the state for the NEAR Social contract has changed

BEFORE

{"root.near": { post: { main: null } } }

AFTER

{"root.near":  { post: { main: { content: "Hello world!", type: "md", blockHeight: 100500 } } } }

Though the data structure might be different, all the state is represented as a key-value storage in the blockchain, so we can observe a change that will tell us:

The state for contract social.near has been updated in this block. Now the value for the key <blob> is <blob>, thank you for your attention.

So, if we watch for all such changes, we can store them and have a history:

<block_height> <contract_id> <key> <value>

Thus, later, we can retrieve a value for a specific block.

state_changes_data

account_id block_height block_hash data_key data_value
varchar varint varchar varchar BLOB
      hex of the key BLOB to be able to perform LIKE queries  

PARTITION KEY (account_id, data_key)

CLUSTERING KEY block_height WITH CLUSTERING ORDER BY (block_height DESC)

Note: NEAR blockchain nature allows more than one change to the same key within a single block. It will end up storing the very last of them, though. Thus, instead of relying on rewriting the same row repeatedly, we introduced a logic in the indexer where we ignore all the previous changes to the key and store only the last one in ScyllaDB.

P.S. The first naive approach was to rely on the UPSERT, but asynchronous code screwed it, so we had the above logic.

state_changes_access_key

account_id block_height block_hash data_key data_value
varchar varint varchar varchar BLOB
      hex of the key BLOB to be able to perform LIKE queries  

PARTITION KEY (account_id, data_key)

CLUSTERING KEY block_height WITH CLUSTERING ORDER BY (block_height DESC)

You may notice that the table state_changes_access_key has the same structure as state_changes_data. This is because the data structure representing these types of data is the same. Thus, we just split the change types for more efficient read operations.

The same AccessKey in the NEAR blockchain can be added and deleted for the same account a bunch of times within the same block, and yet we need to store only the very last change, so we have a custom logic to skip all the changes for the same account with the same AccessKey except for the last change.

account_access_keys (account_access_keys feature)

account_id block_height active_access_keys
varchar varint map<varchar, BLOB>
    Map key is the account’s PublicKey and the value is AccessKey (permission, allowance, etc.)

PARTITION KEY account_id

CLUSTERING KEY block_height WITH CLUSTERING ORDER BY (block_height DESC)

This table efficiently answers the request “Give me all AccessKeys for the account for the given block”. It stores the history of AccessKeys for the account at a given moment.

The logic behind this table is hidden behind the feature account_access_keys. Because having this naive approach leads to fat partitions, we need to redesign it entirely.

state_changes_contract

account_id block_height block_hash data_value
varchar varint varchar BLOB
      Contract code bytes

PARTITION KEY account_id

CLUSTERING KEY block_height WITH CLUSTERING ORDER BY (block_height DESC)

This table stores the history of contract code changes.

state_changes_account

account_id block_height block_hash data_value
varchar varint varchar BLOB
      Account data

PARTITION KEY account_id

CLUSTERING KEY block_height WITH CLUSTERING ORDER BY (block_height DESC)

Similarly to contract code, this table stored the account-related data, represented by the near-primitives::account::Account structure (amount, locked, code hash, storage usage, version)

It serves the needs for requests like “Give me account X for the block N”.

account_state

account_id data_key
varchar varchar
  Hex of the blob key to perform LIKE queries

PARTITION KEY account_id

CLUSTERING KEY data_key

This helper table stores available contract keys, so-called the most recent ones.

Core Primitives

By core primitives, I mean entities straight from the blockchain like Block, Chunk, Transaction, Receipt, ExecutionOutcome, StateChanges, etc.

chunks

chunk_hash block_height shard_id stored_at_block_height
varchar varint varint varint

PRIMARY KEY (chunk_hash, block_height)

The chunks table, along with the Global Secondary Index on block_height serves the needs of methods like block and chunk. This table is a “map” that helps the ReadRPC server to find the read data on AWS S3.

transactions_details

transaction_hash block_height account_id transaction_details
varchar varint varchar BLOB
  We used to have a TX hash collision on mainnet some time ago. Having this value prevents us from troubles   Borsh-serialized bytes of the structure TransactionDetails introduced in the tx-indexer (stores all the entities related to a single transaction)

PARTITION KEY transaction_hash

CLUSTERING KEY block_height WITH CLUSTERING ORDER BY (block_height DESC)

This table serves the needs of a couple of methods. One of them returns the requested Transaction with the initial ExecutionOutcome and a Receipt the Transaction was converted into. The other one returns Transaction and all the Receipts and ExecutionOutcome in the Transaction chain.

We are collecting this data with the tx-indexer and decided to store the entire response for fast response.

receipts_map

receipt_id block_height parent_transaction_hash shard_id
varchar varint varchar varint
    This is a way to make a dream come true
 
PARTITION KEY receipt_id

This table is a “map” to find Receipts on AWS S3 faster. Meanwhile, storing the field parent_transaction_hash is a way to make a brand-new method that can solve many problems for indexer developers.