Database Design

Quick recap

We use a few different data sources:

ScyllaDB
AWS S3
NEAR JSON-RPC

The main source of data for the ReadRPC project is ScyllaDB. However, sometimes, the data size is too big to duplicate in ScyllaDB when the data is on AWS S3. Sometimes, we store a “map” of the AWS S3 data in the ScyllaDB to know exactly where to look.

Thus, the design of the ScyllaDB tables follows the approach for the most efficient way to read the data and return the result to a ReadRPC requester.

Data in ScyllaDB is filled with dedicated indexers built on top of the NEAR Lake Framework (Rust):

state-indexer - everything account state related + metadata for blocks/chunks
tx-indexer - collecting “full” transactions (Transaction itself, related Receipts and ExecutionOutcomes) for faster reading

State

A contract state can be one of these types:

Data - contract state data
AccessKey_ - Access Keys added to an account
Contract code - contract code itself
Account - account-related data like balance

Since the read use cases for all these types are separate, we split them into separate tables to avoid unnecessary data filtering (following the recommended approach of ScyllaDB).

StateChangeWithCauseView structure from near-primitives::views is used.

While I haven’t created a doc where a reader can learn more about StateChangesWithCauseView, I’ll try to explain in a few words here.

This structure represents all the changes to the state happening in the particular Block. If a contract call happens and, as a result, its state has changed, it will be reflected in the StateChangeWithCauseView for the block where it happened.

For example, if someone has created a post in NEAR Social, it means that the value of the state for the NEAR Social contract has changed

BEFORE

{"root.near": { post: { main: null } } }

AFTER

{"root.near":  { post: { main: { content: "Hello world!", type: "md", blockHeight: 100500 } } } }

Though the data structure might be different, all the state is represented as a key-value storage in the blockchain, so we can observe a change that will tell us:

The state for contract social.near has been updated in this block. Now the value for the key <blob> is <blob>, thank you for your attention.

So, if we watch for all such changes, we can store them and have a history:

<block_height> <contract_id> <key> <value>

Thus, later, we can retrieve a value for a specific block.

`state_changes_data`

account_id	block_height	block_hash	data_key	data_value
varchar	varint	varchar	varchar	BLOB
			hex of the key BLOB to be able to perform LIKE queries

PARTITION KEY (account_id, data_key)

CLUSTERING KEY block_height WITH CLUSTERING ORDER BY (block_height DESC)

Note: NEAR blockchain nature allows more than one change to the same key within a single block. It will end up storing the very last of them, though. Thus, instead of relying on rewriting the same row repeatedly, we introduced a logic in the indexer where we ignore all the previous changes to the key and store only the last one in ScyllaDB.

P.S. The first naive approach was to rely on the UPSERT, but asynchronous code screwed it, so we had the above logic.

`state_changes_access_key`

account_id	block_height	block_hash	data_key	data_value
varchar	varint	varchar	varchar	BLOB
			hex of the key BLOB to be able to perform LIKE queries

PARTITION KEY (account_id, data_key)

CLUSTERING KEY block_height WITH CLUSTERING ORDER BY (block_height DESC)

You may notice that the table state_changes_access_key has the same structure as state_changes_data. This is because the data structure representing these types of data is the same. Thus, we just split the change types for more efficient read operations.

The same AccessKey in the NEAR blockchain can be added and deleted for the same account a bunch of times within the same block, and yet we need to store only the very last change, so we have a custom logic to skip all the changes for the same account with the same AccessKey except for the last change.

`account_access_keys` (`account_access_keys` feature)

account_id	block_height	active_access_keys
varchar	varint	map<varchar, BLOB>
		Map key is the account’s PublicKey and the value is AccessKey (permission, allowance, etc.)

PARTITION KEY account_id

CLUSTERING KEY block_height WITH CLUSTERING ORDER BY (block_height DESC)

This table efficiently answers the request “Give me all AccessKeys for the account for the given block”. It stores the history of AccessKeys for the account at a given moment.

The logic behind this table is hidden behind the feature account_access_keys. Because having this naive approach leads to fat partitions, we need to redesign it entirely.

`state_changes_contract`

account_id	block_height	block_hash	data_value
varchar	varint	varchar	BLOB
			Contract code bytes

PARTITION KEY account_id

CLUSTERING KEY block_height WITH CLUSTERING ORDER BY (block_height DESC)

This table stores the history of contract code changes.

`state_changes_account`

account_id	block_height	block_hash	data_value
varchar	varint	varchar	BLOB
			Account data

PARTITION KEY account_id

CLUSTERING KEY block_height WITH CLUSTERING ORDER BY (block_height DESC)

Similarly to contract code, this table stored the account-related data, represented by the near-primitives::account::Account structure (amount, locked, code hash, storage usage, version)

It serves the needs for requests like “Give me account X for the block N”.

`account_state`

account_id	data_key
varchar	varchar
	Hex of the blob key to perform LIKE queries

PARTITION KEY account_id

CLUSTERING KEY data_key

This helper table stores available contract keys, so-called the most recent ones.

Core Primitives

By core primitives, I mean entities straight from the blockchain like Block, Chunk, Transaction, Receipt, ExecutionOutcome, StateChanges, etc.

`chunks`

chunk_hash	block_height	shard_id	stored_at_block_height
varchar	varint	varint	varint

PRIMARY KEY (chunk_hash, block_height)

The chunks table, along with the Global Secondary Index on block_height serves the needs of methods like block and chunk. This table is a “map” that helps the ReadRPC server to find the read data on AWS S3.

`transactions_details`

transaction_hash	block_height	account_id	transaction_details
varchar	varint	varchar	BLOB
	We used to have a TX hash collision on mainnet some time ago. Having this value prevents us from troubles		Borsh-serialized bytes of the structure TransactionDetails introduced in the `tx-indexer` (stores all the entities related to a single transaction)

PARTITION KEY transaction_hash

CLUSTERING KEY block_height WITH CLUSTERING ORDER BY (block_height DESC)

This table serves the needs of a couple of methods. One of them returns the requested Transaction with the initial ExecutionOutcome and a Receipt the Transaction was converted into. The other one returns Transaction and all the Receipts and ExecutionOutcome in the Transaction chain.

We are collecting this data with the tx-indexer and decided to store the entire response for fast response.

`receipts_map`

receipt_id	block_height	parent_transaction_hash	shard_id
varchar	varint	varchar	varint
		This is a way to make a dream come true

PARTITION KEY receipt_id

This table is a “map” to find Receipts on AWS S3 faster. Meanwhile, storing the field parent_transaction_hash is a way to make a brand-new method that can solve many problems for indexer developers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database Design

Quick recap

State

BEFORE

AFTER

`state_changes_data`

`state_changes_access_key`

`account_access_keys` (`account_access_keys` feature)

`state_changes_contract`

`state_changes_account`

`account_state`

Core Primitives

`chunks`

`transactions_details`

`receipts_map`

Clone this wiki locally

Database Design

Quick recap

State

BEFORE

AFTER

state_changes_data

state_changes_access_key

account_access_keys (account_access_keys feature)

state_changes_contract

state_changes_account

account_state

Core Primitives

chunks

transactions_details

receipts_map

Clone this wiki locally

`state_changes_data`

`state_changes_access_key`

`account_access_keys` (`account_access_keys` feature)

`state_changes_contract`

`state_changes_account`

`account_state`

`chunks`

`transactions_details`

`receipts_map`