Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec: remove the JSON spec for content file and file scan task sections #9771

Merged
merged 1 commit into from
Jul 16, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 0 additions & 36 deletions format/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -1230,42 +1230,6 @@ Example
] } ]
```

### Content File (Data and Delete) Serialization

Content file (data or delete) is serialized as a JSON object according to the following table.

| Metadata field |JSON representation|Example|
|--------------------------|--- |--- |
| **`spec-id`** |`JSON int`|`1`|
| **`content`** |`JSON string`|`DATA`, `POSITION_DELETES`, `EQUALITY_DELETES`|
| **`file-path`** |`JSON string`|`"s3://b/wh/data.db/table"`|
| **`file-format`** |`JSON string`|`AVRO`, `ORC`, `PARQUET`|
| **`partition`** |`JSON object: Partition data tuple using partition field ids for the struct field ids`|`{"1000":1}`|
| **`record-count`** |`JSON long`|`1`|
| **`file-size-in-bytes`** |`JSON long`|`1024`|
| **`column-sizes`** |`JSON object: Map from column id to the total size on disk of all regions that store the column.`|`{"keys":[3,4],"values":[100,200]}`|
| **`value-counts`** |`JSON object: Map from column id to number of values in the column (including null and NaN values)`|`{"keys":[3,4],"values":[90,180]}`|
| **`null-value-counts`** |`JSON object: Map from column id to number of null values in the column`|`{"keys":[3,4],"values":[10,20]}`|
| **`nan-value-counts`** |`JSON object: Map from column id to number of NaN values in the column`|`{"keys":[3,4],"values":[0,0]}`|
| **`lower-bounds`** |`JSON object: Map from column id to lower bound binary in the column serialized as hexadecimal string`|`{"keys":[3,4],"values":["01000000","02000000"]}`|
| **`upper-bounds`** |`JSON object: Map from column id to upper bound binary in the column serialized as hexadecimal string`|`{"keys":[3,4],"values":["05000000","0A000000"]}`|
| **`key-metadata`** |`JSON string: Encryption key metadata binary serialized as hexadecimal string`|`00000000000000000000000000000000`|
| **`split-offsets`** |`JSON list of long: Split offsets for the data file`|`[128,256]`|
| **`equality-ids`** |`JSON list of int: Field ids used to determine row equality in equality delete files`|`[1]`|
| **`sort-order-id`** |`JSON int`|`1`|

### File Scan Task Serialization

File scan task is serialized as a JSON object according to the following table.

| Metadata field |JSON representation|Example|
|--------------------------|--- |--- |
| **`schema`** |`JSON object`|`See above, read schemas instead`|
| **`spec`** |`JSON object`|`See above, read partition specs instead`|
| **`data-file`** |`JSON object`|`See above, read content file instead`|
| **`delete-files`** |`JSON list of objects`|`See above, read content file instead`|
| **`residual-filter`** |`JSON object: residual filter expression`|`{"type":"eq","term":"id","value":1}`|

## Appendix D: Single-value serialization

### Binary single-value serialization
Expand Down