From ab580b9955ade2c4a755d5b8e150058088a48c2a Mon Sep 17 00:00:00 2001 From: Steven Zhen Wu Date: Mon, 15 Jul 2024 21:02:35 -0700 Subject: [PATCH] Spec: remove the JSON spec for content file and file scan task sections. (#9771) They shouldn't be part of the core table spec although the JSON serializer is valuable for FileScanTask serialization. See discussion thread for more context: https://lists.apache.org/thread/2ty27yx4q0zlqd5h71cyyhb5k47yf9bv --- format/spec.md | 36 ------------------------------------ 1 file changed, 36 deletions(-) diff --git a/format/spec.md b/format/spec.md index 9a3c16e3ac91..dd4e901f3706 100644 --- a/format/spec.md +++ b/format/spec.md @@ -1230,42 +1230,6 @@ Example ] } ] ``` -### Content File (Data and Delete) Serialization - -Content file (data or delete) is serialized as a JSON object according to the following table. - -| Metadata field |JSON representation|Example| -|--------------------------|--- |--- | -| **`spec-id`** |`JSON int`|`1`| -| **`content`** |`JSON string`|`DATA`, `POSITION_DELETES`, `EQUALITY_DELETES`| -| **`file-path`** |`JSON string`|`"s3://b/wh/data.db/table"`| -| **`file-format`** |`JSON string`|`AVRO`, `ORC`, `PARQUET`| -| **`partition`** |`JSON object: Partition data tuple using partition field ids for the struct field ids`|`{"1000":1}`| -| **`record-count`** |`JSON long`|`1`| -| **`file-size-in-bytes`** |`JSON long`|`1024`| -| **`column-sizes`** |`JSON object: Map from column id to the total size on disk of all regions that store the column.`|`{"keys":[3,4],"values":[100,200]}`| -| **`value-counts`** |`JSON object: Map from column id to number of values in the column (including null and NaN values)`|`{"keys":[3,4],"values":[90,180]}`| -| **`null-value-counts`** |`JSON object: Map from column id to number of null values in the column`|`{"keys":[3,4],"values":[10,20]}`| -| **`nan-value-counts`** |`JSON object: Map from column id to number of NaN values in the column`|`{"keys":[3,4],"values":[0,0]}`| -| **`lower-bounds`** |`JSON object: Map from column id to lower bound binary in the column serialized as hexadecimal string`|`{"keys":[3,4],"values":["01000000","02000000"]}`| -| **`upper-bounds`** |`JSON object: Map from column id to upper bound binary in the column serialized as hexadecimal string`|`{"keys":[3,4],"values":["05000000","0A000000"]}`| -| **`key-metadata`** |`JSON string: Encryption key metadata binary serialized as hexadecimal string`|`00000000000000000000000000000000`| -| **`split-offsets`** |`JSON list of long: Split offsets for the data file`|`[128,256]`| -| **`equality-ids`** |`JSON list of int: Field ids used to determine row equality in equality delete files`|`[1]`| -| **`sort-order-id`** |`JSON int`|`1`| - -### File Scan Task Serialization - -File scan task is serialized as a JSON object according to the following table. - -| Metadata field |JSON representation|Example| -|--------------------------|--- |--- | -| **`schema`** |`JSON object`|`See above, read schemas instead`| -| **`spec`** |`JSON object`|`See above, read partition specs instead`| -| **`data-file`** |`JSON object`|`See above, read content file instead`| -| **`delete-files`** |`JSON list of objects`|`See above, read content file instead`| -| **`residual-filter`** |`JSON object: residual filter expression`|`{"type":"eq","term":"id","value":1}`| - ## Appendix D: Single-value serialization ### Binary single-value serialization