Skip to content

Commit

Permalink
Spec: remove the JSON spec for content file and file scan task sectio…
Browse files Browse the repository at this point in the history
…ns. (#9771)

They shouldn't be part of the core table spec although the JSON serializer is valuable for FileScanTask serialization. See discussion thread for more context: https://lists.apache.org/thread/2ty27yx4q0zlqd5h71cyyhb5k47yf9bv
  • Loading branch information
stevenzwu authored Jul 16, 2024
1 parent 5479545 commit ab580b9
Showing 1 changed file with 0 additions and 36 deletions.
36 changes: 0 additions & 36 deletions format/spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -1230,42 +1230,6 @@ Example
] } ]
```

### Content File (Data and Delete) Serialization

Content file (data or delete) is serialized as a JSON object according to the following table.

| Metadata field |JSON representation|Example|
|--------------------------|--- |--- |
| **`spec-id`** |`JSON int`|`1`|
| **`content`** |`JSON string`|`DATA`, `POSITION_DELETES`, `EQUALITY_DELETES`|
| **`file-path`** |`JSON string`|`"s3://b/wh/data.db/table"`|
| **`file-format`** |`JSON string`|`AVRO`, `ORC`, `PARQUET`|
| **`partition`** |`JSON object: Partition data tuple using partition field ids for the struct field ids`|`{"1000":1}`|
| **`record-count`** |`JSON long`|`1`|
| **`file-size-in-bytes`** |`JSON long`|`1024`|
| **`column-sizes`** |`JSON object: Map from column id to the total size on disk of all regions that store the column.`|`{"keys":[3,4],"values":[100,200]}`|
| **`value-counts`** |`JSON object: Map from column id to number of values in the column (including null and NaN values)`|`{"keys":[3,4],"values":[90,180]}`|
| **`null-value-counts`** |`JSON object: Map from column id to number of null values in the column`|`{"keys":[3,4],"values":[10,20]}`|
| **`nan-value-counts`** |`JSON object: Map from column id to number of NaN values in the column`|`{"keys":[3,4],"values":[0,0]}`|
| **`lower-bounds`** |`JSON object: Map from column id to lower bound binary in the column serialized as hexadecimal string`|`{"keys":[3,4],"values":["01000000","02000000"]}`|
| **`upper-bounds`** |`JSON object: Map from column id to upper bound binary in the column serialized as hexadecimal string`|`{"keys":[3,4],"values":["05000000","0A000000"]}`|
| **`key-metadata`** |`JSON string: Encryption key metadata binary serialized as hexadecimal string`|`00000000000000000000000000000000`|
| **`split-offsets`** |`JSON list of long: Split offsets for the data file`|`[128,256]`|
| **`equality-ids`** |`JSON list of int: Field ids used to determine row equality in equality delete files`|`[1]`|
| **`sort-order-id`** |`JSON int`|`1`|

### File Scan Task Serialization

File scan task is serialized as a JSON object according to the following table.

| Metadata field |JSON representation|Example|
|--------------------------|--- |--- |
| **`schema`** |`JSON object`|`See above, read schemas instead`|
| **`spec`** |`JSON object`|`See above, read partition specs instead`|
| **`data-file`** |`JSON object`|`See above, read content file instead`|
| **`delete-files`** |`JSON list of objects`|`See above, read content file instead`|
| **`residual-filter`** |`JSON object: residual filter expression`|`{"type":"eq","term":"id","value":1}`|

## Appendix D: Single-value serialization

### Binary single-value serialization
Expand Down

0 comments on commit ab580b9

Please sign in to comment.