Skip to content
This repository has been archived by the owner on Oct 9, 2018. It is now read-only.

add formats translate #40

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions zh/formats/capnproto.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,17 @@

# CapnProto

Cap'n Proto is a binary message format similar to Protocol Buffers and Thrift, but not like JSON or MessagePack.
Cap'n Proto 是一种二进制消息格式,类似 Protocol Buffers 和 Thriftis,但与 JSON 或 MessagePack 格式不一样。

Cap'n Proto 消息格式是严格类型的,而不是自我描述,这意味着它们不需要外部的描述。这种格式可以实时地应用,并针对每个查询进行缓存。

Cap'n Proto messages are strictly typed and not self-describing, meaning they need an external schema description. The schema is applied on the fly and cached for each query.

```sql
SELECT SearchPhrase, count() AS c FROM test.hits
GROUP BY SearchPhrase FORMAT CapnProto SETTINGS schema = 'schema:Message'
```

Where `schema.capnp` looks like this:
其中 `schema.capnp` 描述如下:

```
struct Message {
Expand All @@ -20,7 +21,8 @@ struct Message {
}
```

Schema files are in the file that is located in the directory specified in [ format_schema_path](../operations/server_settings/settings.md#server_settings-format_schema_path) in the server configuration.

Deserialization is effective and usually doesn't increase the system load.
格式文件存储的目录可以在服务配置中的[ format_schema_path ](../operations/server_settings/settings.md#server_settings-format_schema_path) 指定。

Cap'n Proto 反序列化是很高效的,通常不会增加系统的负载。

8 changes: 4 additions & 4 deletions zh/formats/csv.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# CSV -15

Comma Separated Values format ([RFC](https://tools.ietf.org/html/rfc4180)).
按逗号分隔的数据格式([RFC](https://tools.ietf.org/html/rfc4180))

When formatting, rows are enclosed in double quotes. A double quote inside a string is output as two double quotes in a row. There are no other rules for escaping characters. Date and date-time are enclosed in double quotes. Numbers are output without quotes. Values ​​are separated by commas. Rows are separated using the Unix line feed (LF). Arrays are serialized in CSV as follows: first the array is serialized to a string as in TabSeparated format, and then the resulting string is output to CSV in double quotes. Tuples in CSV format are serialized as separate columns (that is, their nesting in the tuple is lost).
格式化的时候,行是用双引号括起来的。字符串中的双引号会以两个双引号输出,除此之外没有其他规则来做字符转义了。日期和时间也会以双引号包括。数字的输出不带引号。值由逗号分隔。行使用 Unix 换行符(LF)分隔。 数组序列化成 CSV 规则如下:首先将数组序列化为 TabSeparated 格式的字符串,然后将结果字符串用双引号包括输出到 CSVCSV 格式的元组被序列化为单独的列(即它们在元组中的嵌套关系会丢失)。

When parsing, all values can be parsed either with or without quotes. Both double and single quotes are supported. Rows can also be arranged without quotes. In this case, they are parsed up to a comma or line feed (CR or LF). In violation of the RFC, when parsing rows without quotes, the leading and trailing spaces and tabs are ignored. For the line feed, Unix (LF), Windows (CR LF) and Mac OS Classic (CR LF) are all supported.
解析的时候,可以使用或不使用引号来解析所有值。支持双引号和单引号。行也可以不用引号排列。 在这种情况下,它们被解析为逗号或换行符(CR LF)。在解析不带引号的行时,若违反 RFC 规则,会忽略前导和尾随的空格和制表符。 对于换行,全部支持 Unix(LF),WindowsCR LF)和 Mac OS ClassicCR LF)。

The CSV format supports the output of totals and extremes the same way as `TabSeparated`.
CSV格式支持以和 TabSeparated 相同的方式输出总数和极值。

2 changes: 1 addition & 1 deletion zh/formats/csvwithnames.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# CSVWithNames

Also prints the header row, similar to `TabSeparatedWithNames`.
会输出带头部行,和 `TabSeparatedWithNames` 一样。

2 changes: 1 addition & 1 deletion zh/formats/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

# 数据格式

The format determines how data is returned to you after SELECTs (how it is written and formatted by the server), and how it is accepted for INSERTs (how it is read and parsed by the server).
数据格式决定了 SELECTs 之后数据的返回方式(服务器如何写入和格式化)以及 INSERT 如何接受数据(服务器如何读取和解析数据)。

```eval_rst
.. toctree::
Expand Down
18 changes: 9 additions & 9 deletions zh/formats/json.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# JSON

Outputs data in JSON format. Besides data tables, it also outputs column names and types, along with some additional information: the total number of output rows, and the number of rows that could have been output if there weren't a LIMIT. Example:
JSON 格式输出数据。除了数据表之外,它还输出列名称和类型以及一些附加信息:输出行的总数以及在没有 LIMIT 时可以输出的行数。 例:

```sql
SELECT SearchPhrase, count() AS c FROM test.hits GROUP BY SearchPhrase WITH TOTALS ORDER BY c DESC LIMIT 5 FORMAT JSON
Expand Down Expand Up @@ -70,17 +70,17 @@ SELECT SearchPhrase, count() AS c FROM test.hits GROUP BY SearchPhrase WITH TOTA
}
```

The JSON is compatible with JavaScript. To ensure this, some characters are additionally escaped: the slash ` /` is escaped as ` \/`; alternative line breaks ` U+2028` and ` U+2029`, which break some browsers, are escaped as ` \uXXXX`. ASCII control characters are escaped: backspace, form feed, line feed, carriage return, and horizontal tab are replaced with `\b`, `\f`, `\n`, `\r`, `\t` , as well as the remaining bytes in the 00-1F range using `\uXXXX` sequences. Invalid UTF-8 sequences are changed to the replacement character � so the output text will consist of valid UTF-8 sequences. For compatibility with JavaScript, Int64 and UInt64 integers are enclosed in double quotes by default. To remove the quotes, you can set the configuration parameter output_format_json_quote_64bit_integers to 0.
JSON JavaScript 兼容。为了确保这一点,一些字符被另外转义:斜线`/`被转义为`\/`; 替代的换行符 `U+2028` 和 `U+2029` 会打断一些浏览器解析,它们会被转义为 `\uXXXX` ASCII 控制字符被转义:退格,换页,换行,回车和水平制表符被替换为`\b``\f``\n``\r``\t` 作为使用`\uXXXX`序列的00-1F范围内的剩余字节。 无效的 UTF-8 序列更改为替换字符 ,因此输出文本将包含有效的 UTF-8 序列。 为了与 JavaScript 兼容,默认情况下,Int64 UInt64 整数用双引号引起来。要除去引号,可以将配置参数 output_format_json_quote_64bit_integers 设置为0。

`rows` – The total number of output rows.
`rows` – 结果输出的行数。

`rows_before_limit_at_least` The minimal number of rows there would have been without LIMIT. Output only if the query contains LIMIT.
If the query contains GROUP BY, rows_before_limit_at_least is the exact number of rows there would have been without a LIMIT.
`rows_before_limit_at_least` 去掉 LIMIT 过滤后的最小行总数。 只会在查询包含 LIMIT 条件时输出。
若查询包含 GROUP BYrows_before_limit_at_least 就是去掉 LIMIT 后过滤后的准确行数。

`totals` – Total values (when using WITH TOTALS).
`totals` – 总值 (当使用 TOTALS 条件时)。

`extremes` – Extreme values (when extremes is set to 1).
`extremes` – 极值 (当 extremes 设置为 1时)。

This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
See also the JSONEachRow format.
该格式仅适用于输出查询结果,但不适用于解析(将数据插入到表中)。
参考 JSONEachRow 格式。

8 changes: 4 additions & 4 deletions zh/formats/jsoncompact.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# JSONCompact

Differs from JSON only in that data rows are output in arrays, not in objects.
JSON 格式不同的是它以数组的方式输出结果,而不是以结构体。

Example:
示例:

```json
{
Expand Down Expand Up @@ -41,6 +41,6 @@ Example:
}
```

This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
See also the `JSONEachRow` format.
这种格式仅仅适用于输出结果集,而不适用于解析(将数据插入到表中)。
参考 `JSONEachRow` 格式。

6 changes: 3 additions & 3 deletions zh/formats/jsoneachrow.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# JSONEachRow

Outputs data as separate JSON objects for each row (newline delimited JSON).
将数据结果每一行以 JSON 结构体输出(换行分割 JSON 结构体)。

```json
{"SearchPhrase":"","count()":"8267016"}
Expand All @@ -15,7 +15,7 @@ Outputs data as separate JSON objects for each row (newline delimited JSON).
{"SearchPhrase":"baku","count()":"1000"}
```

Unlike the JSON format, there is no substitution of invalid UTF-8 sequences. Any set of bytes can be output in the rows. This is necessary so that data can be formatted without losing any information. Values are escaped in the same way as for JSON.
JSON 格式不同的是,没有替换无效的UTF-8序列。任何一组字节都可以在行中输出。这是必要的,因为这样数据可以被格式化而不会丢失任何信息。值的转义方式与JSON相同。

For parsing, any order is supported for the values of different columns. It is acceptable for some values to be omitted – they are treated as equal to their default values. In this case, zeros and blank rows are used as default values. Complex values that could be specified in the table are not supported as defaults. Whitespace between elements is ignored. If a comma is placed after the objects, it is ignored. Objects don't necessarily have to be separated by new lines.
对于解析,任何顺序都支持不同列的值。可以省略某些值 - 它们被视为等于它们的默认值。在这种情况下,零和空行被用作默认值。 作为默认值,不支持表中指定的复杂值。元素之间的空白字符被忽略。如果在对象之后放置逗号,它将被忽略。对象不一定必须用新行分隔。

5 changes: 2 additions & 3 deletions zh/formats/native.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Native

The most efficient format. Data is written and read by blocks in binary format. For each block, the number of rows, number of columns, column names and types, and parts of columns in this block are recorded one after another. In other words, this format is "columnar" – it doesn't convert columns to rows. This is the format used in the native interface for interaction between servers, for using the command-line client, and for C++ clients.

You can use this format to quickly generate dumps that can only be read by the ClickHouse DBMS. It doesn't make sense to work with this format yourself.
最高性能的格式。 据通过二进制格式的块进行写入和读取。对于每个块,该块中的行数,列数,列名称和类型以及列的部分将被相继记录。 换句话说,这种格式是 “列式”的 - 它不会将列转换为行。 这是用于在服务器之间进行交互的本地界面中使用的格式,用于使用命令行客户端和 C++ 客户端。

您可以使用此格式快速生成只能由 ClickHouse DBMS 读取的格式。但自己处理这种格式是没有意义的。
4 changes: 2 additions & 2 deletions zh/formats/null.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Null

Nothing is output. However, the query is processed, and when using the command-line client, data is transmitted to the client. This is used for tests, including productivity testing.
Obviously, this format is only appropriate for output, not for parsing.
没有输出。但是,查询已处理完毕,并且在使用命令行客户端时,数据将传输到客户端。这仅用于测试,包括生产力测试。
显然,这种格式只适用于输出,不适用于解析。

12 changes: 6 additions & 6 deletions zh/formats/pretty.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Pretty

Outputs data as Unicode-art tables, also using ANSI-escape sequences for setting colors in the terminal.
A full grid of the table is drawn, and each row occupies two lines in the terminal.
Each result block is output as a separate table. This is necessary so that blocks can be output without buffering results (buffering would be necessary in order to pre-calculate the visible width of all the values).
To avoid dumping too much data to the terminal, only the first 10,000 rows are printed. If the number of rows is greater than or equal to 10,000, the message "Showed first 10 000" is printed.
This format is only appropriate for outputting a query result, but not for parsing (retrieving data to insert in a table).
将数据以表格形式输出,也可以使用 ANSI 转义字符在终端中设置颜色。
它会绘制一个完整的表格,每行数据在终端中占用两行。
每一个结果块都会以单独的表格输出。这是很有必要的,以便结果块不用缓冲结果输出(缓冲在可以预见结果集宽度的时候是很有必要的)。
为避免将太多数据传输到终端,只打印前10,000行。 如果行数大于或等于10,000,则会显示消息“Showed first 10 000”。
该格式仅适用于输出查询结果,但不适用于解析(将数据插入到表中)。

The Pretty format supports outputting total values (when using WITH TOTALS) and extremes (when 'extremes' is set to 1). In these cases, total values and extreme values are output after the main data, in separate tables. Example (shown for the PrettyCompact format):
Pretty格式支持输出总值(当使用 WITH TOTALS 时)和极值(当 `extremes` 设置为1时)。 在这些情况下,总数值和极值在主数据之后以单独的表格形式输出。 示例(以 PrettyCompact 格式显示):

```sql
SELECT EventDate, count() AS c FROM test.hits GROUP BY EventDate WITH TOTALS ORDER BY EventDate FORMAT PrettyCompact
Expand Down
3 changes: 1 addition & 2 deletions zh/formats/prettycompact.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# PrettyCompact

Differs from `Pretty` in that the grid is drawn between rows and the result is more compact.
This format is used by default in the command-line client in interactive mode.
与 `Pretty` 格式不一样的是,`PrettyCompact` 去掉了行之间的表格分割线,这样使得结果更加紧凑。这种格式会在交互命令行客户端下默认使用。

2 changes: 1 addition & 1 deletion zh/formats/prettycompactmonoblock.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# PrettyCompactMonoBlock

Differs from `PrettyCompact` in that up to 10,000 rows are buffered, then output as a single table, not by blocks.
`PrettyCompact` 格式不一样的是,它支持 10,000 行数据缓冲,然后输出在一个表格中,不会按照块来区分

10 changes: 5 additions & 5 deletions zh/formats/prettynoescapes.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
# PrettyNoEscapes

Differs from Pretty in that ANSI-escape sequences aren't used. This is necessary for displaying this format in a browser, as well as for using the 'watch' command-line utility.
与 `Pretty` 格式不一样的是,它不使用 ANSI 字符转义, 这在浏览器显示数据以及在使用 `watch` 命令行工具是有必要的。

Example:
示例:

```bash
watch -n1 "clickhouse-client --query='SELECT * FROM system.events FORMAT PrettyCompactNoEscapes'"
```

You can use the HTTP interface for displaying in the browser.
您可以使用 HTTP 接口来获取数据,显示在浏览器中。

## PrettyCompactNoEscapes

The same as the previous setting.
用法类比上述。

## PrettySpaceNoEscapes

The same as the previous setting.
用法类比上述。

2 changes: 1 addition & 1 deletion zh/formats/prettyspace.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# PrettySpace

Differs from `PrettyCompact` in that whitespace (space characters) is used instead of the grid.
`PrettyCompact` 格式不一样的是,它使用空格来代替网格来显示数据。

16 changes: 8 additions & 8 deletions zh/formats/rowbinary.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# RowBinary

Formats and parses data by row in binary format. Rows and values are listed consecutively, without separators.
This format is less efficient than the Native format, since it is row-based.
以二进制格式逐行格式化和解析数据。行和值连续列出,没有分隔符。
这种格式比 Native 格式效率低,因为它是基于行的。

Integers use fixed-length little endian representation. For example, UInt64 uses 8 bytes.
DateTime is represented as UInt32 containing the Unix timestamp as the value.
Date is represented as a UInt16 object that contains the number of days since 1970-01-01 as the value.
String is represented as a varint length (unsigned [LEB128](https://en.wikipedia.org/wiki/LEB128)), followed by the bytes of the string.
FixedString is represented simply as a sequence of bytes.
整数使用固定长度的小端表示法。 例如,UInt64 使用8个字节。
DateTime 被表示为 UInt32 类型的Unix 时间戳值。
Date 被表示为 UInt16 对象,它的值为 1970-01-01以来的天数。
字符串表示为 varint 长度(无符号[LEB128](https://en.wikipedia.org/wiki/LEB128)),后跟字符串的字节数。
FixedString 被简单地表示为一个字节序列。

Arrays are represented as a varint length (unsigned [LEB128](https://en.wikipedia.org/wiki/LEB128)), followed by the array elements in order.
数组表示为 varint 长度(无符号[LEB128](https://en.wikipedia.org/wiki/LEB128)),后跟有序的数组元素。

Loading