From 9e22d51d0b883cba5c7864b134edc978b81cf193 Mon Sep 17 00:00:00 2001 From: Steven Wu Date: Wed, 21 Feb 2024 15:41:27 -0800 Subject: [PATCH] Spec: remove the JSON spec for content file and file scan task sections. They shouldn't be part of the core table spec although the JSON serializer is valuable for FileScanTask serialization. See discussion thread for more context: https://lists.apache.org/thread/2ty27yx4q0zlqd5h71cyyhb5k47yf9bv --- format/spec.md | 36 ------------------------------------ 1 file changed, 36 deletions(-) diff --git a/format/spec.md b/format/spec.md index 9a3c16e3ac91..dd4e901f3706 100644 --- a/format/spec.md +++ b/format/spec.md @@ -1230,42 +1230,6 @@ Example ] } ] ``` -### Content File (Data and Delete) Serialization - -Content file (data or delete) is serialized as a JSON object according to the following table. - -| Metadata field |JSON representation|Example| -|--------------------------|--- |--- | -| **`spec-id`** |`JSON int`|`1`| -| **`content`** |`JSON string`|`DATA`, `POSITION_DELETES`, `EQUALITY_DELETES`| -| **`file-path`** |`JSON string`|`"s3://b/wh/data.db/table"`| -| **`file-format`** |`JSON string`|`AVRO`, `ORC`, `PARQUET`| -| **`partition`** |`JSON object: Partition data tuple using partition field ids for the struct field ids`|`{"1000":1}`| -| **`record-count`** |`JSON long`|`1`| -| **`file-size-in-bytes`** |`JSON long`|`1024`| -| **`column-sizes`** |`JSON object: Map from column id to the total size on disk of all regions that store the column.`|`{"keys":[3,4],"values":[100,200]}`| -| **`value-counts`** |`JSON object: Map from column id to number of values in the column (including null and NaN values)`|`{"keys":[3,4],"values":[90,180]}`| -| **`null-value-counts`** |`JSON object: Map from column id to number of null values in the column`|`{"keys":[3,4],"values":[10,20]}`| -| **`nan-value-counts`** |`JSON object: Map from column id to number of NaN values in the column`|`{"keys":[3,4],"values":[0,0]}`| -| **`lower-bounds`** |`JSON object: Map from column id to lower bound binary in the column serialized as hexadecimal string`|`{"keys":[3,4],"values":["01000000","02000000"]}`| -| **`upper-bounds`** |`JSON object: Map from column id to upper bound binary in the column serialized as hexadecimal string`|`{"keys":[3,4],"values":["05000000","0A000000"]}`| -| **`key-metadata`** |`JSON string: Encryption key metadata binary serialized as hexadecimal string`|`00000000000000000000000000000000`| -| **`split-offsets`** |`JSON list of long: Split offsets for the data file`|`[128,256]`| -| **`equality-ids`** |`JSON list of int: Field ids used to determine row equality in equality delete files`|`[1]`| -| **`sort-order-id`** |`JSON int`|`1`| - -### File Scan Task Serialization - -File scan task is serialized as a JSON object according to the following table. - -| Metadata field |JSON representation|Example| -|--------------------------|--- |--- | -| **`schema`** |`JSON object`|`See above, read schemas instead`| -| **`spec`** |`JSON object`|`See above, read partition specs instead`| -| **`data-file`** |`JSON object`|`See above, read content file instead`| -| **`delete-files`** |`JSON list of objects`|`See above, read content file instead`| -| **`residual-filter`** |`JSON object: residual filter expression`|`{"type":"eq","term":"id","value":1}`| - ## Appendix D: Single-value serialization ### Binary single-value serialization