From fb2c0e0e73d72a4f437bd1ed8f682882f667cd4e Mon Sep 17 00:00:00 2001 From: Katya Macedo Date: Tue, 28 Nov 2023 15:51:22 -0600 Subject: [PATCH 1/5] Add decode_base64 functions --- docs/querying/sql-functions.md | 16 +++++++++ docs/querying/sql-scalar.md | 64 ++++++++++++++++++++++++++-------- 2 files changed, 66 insertions(+), 14 deletions(-) diff --git a/docs/querying/sql-functions.md b/docs/querying/sql-functions.md index 8e43076518db..450f582140a7 100644 --- a/docs/querying/sql-functions.md +++ b/docs/querying/sql-functions.md @@ -504,6 +504,22 @@ Returns the current timestamp in the connection's time zone. Rounds down a timestamp by a given time unit. +## DECODE_BASE64_COMPLEX + +`DECODE_BASE64_COMPLEX(expr1, expr2)` + +**Function type:** [Scalar, other](sql-scalar.md#other-scalar-functions) + +Decodes complex expressions represented as literals, where `expr1` is a string literal with a valid [complex type name](sql-scalar.md#complex-type-expressions) and `expr2` is a base64-encoded string that contains a serialized value of the type defined in `expr1`. + +## DECODE_BASE64_UTF8 + +`DECODE_BASE64_UTF8(expr)` + +**Function type:** [Scalar, string](sql-scalar.md#string-functions) + +Decodes base64-encoded strings into UTF-8 encoded strings. + ## DEGREES `DEGREES()` diff --git a/docs/querying/sql-scalar.md b/docs/querying/sql-scalar.md index c9409dd07bfd..8bd55ae4cb70 100644 --- a/docs/querying/sql-scalar.md +++ b/docs/querying/sql-scalar.md @@ -93,35 +93,36 @@ String functions accept strings, and return a type appropriate to the function. |--------|-----| |`CONCAT(expr, expr...)`|Concats a list of expressions. Also see the [concatenation operator](sql-operators.md#concatenation-operator).| |`TEXTCAT(expr, expr)`|Two argument version of `CONCAT`.| -|`STRING_FORMAT(pattern, [args...])`|Returns a string formatted in the manner of Java's [String.format](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#format-java.lang.String-java.lang.Object...-).| +|`CONTAINS_STRING(expr, str)`|Returns true if the `str` is a substring of `expr`.| +|`ICONTAINS_STRING(expr, str)`|Returns true if the `str` is a substring of `expr`. The match is case-insensitive.| +|`DECODE_BASE64_UTF8(expr)`|Decodes base64-encoded strings into UTF-8 encoded strings.| +|`LEFT(expr, [length])`|Returns the leftmost length characters from `expr`.| +|`RIGHT(expr, [length])`|Returns the rightmost length characters from `expr`.| |`LENGTH(expr)`|Length of `expr` in UTF-16 code units.| |`CHAR_LENGTH(expr)`|Alias for `LENGTH`.| |`CHARACTER_LENGTH(expr)`|Alias for `LENGTH`.| |`STRLEN(expr)`|Alias for `LENGTH`.| -|`LOOKUP(expr, lookupName, [replaceMissingValueWith])`|Look up `expr` in a registered [query-time lookup table](lookups.md). Note that lookups can also be queried directly using the [`lookup` schema](sql.md#from). Optional constant replaceMissingValueWith can be passed as 3rd argument to be returned when value is missing from lookup.| +|`LOOKUP(expr, lookupName, [replaceMissingValueWith])`|Look up `expr` in a registered [query-time lookup table](lookups.md). Note that lookups can also be queried directly using the [`lookup` schema](sql.md#from). Optional constant `replaceMissingValueWith` can be passed as a third argument to be returned when value is missing from lookup.| |`LOWER(expr)`|Returns `expr` in all lowercase.| |`UPPER(expr)`|Returns `expr` in all uppercase.| +|`LPAD(expr, length, [chars])`|Returns a string of `length` from `expr` left-padded with `chars`. If `length` is shorter than the length of `expr`, the result is `expr` which is truncated to `length`. The result will be null if either `expr` or `chars` is null. If `chars` is an empty string, no padding is added, however `expr` may be trimmed if necessary.| +|`RPAD(expr, length, [chars])`|Returns a string of `length` from `expr` right-padded with `chars`. If `length` is shorter than the length of `expr`, the result is `expr` which is truncated to `length`. The result will be null if either `expr` or `chars` is null. If `chars` is an empty string, no padding is added, however `expr` may be trimmed if necessary.| |`PARSE_LONG(string, [radix])`|Parses a string into a long (BIGINT) with the given radix, or 10 (decimal) if a radix is not provided.| |`POSITION(needle IN haystack [FROM fromIndex])`|Returns the index of `needle` within `haystack`, with indexes starting from 1. The search will begin at `fromIndex`, or 1 if `fromIndex` is not specified. If `needle` is not found, returns 0.| |`REGEXP_EXTRACT(expr, pattern, [index])`|Apply regular expression `pattern` to `expr` and extract a capture group, or `NULL` if there is no match. If index is unspecified or zero, returns the first substring that matched the pattern. The pattern may match anywhere inside `expr`; if you want to match the entire string instead, use the `^` and `$` markers at the start and end of your pattern. Note: when `druid.generic.useDefaultValueForNull = true`, it is not possible to differentiate an empty-string match from a non-match (both will return `NULL`).| |`REGEXP_LIKE(expr, pattern)`|Returns whether `expr` matches regular expression `pattern`. The pattern may match anywhere inside `expr`; if you want to match the entire string instead, use the `^` and `$` markers at the start and end of your pattern. Similar to [`LIKE`](sql-operators.md#logical-operators), but uses regexps instead of LIKE patterns. Especially useful in WHERE clauses.| |`REGEXP_REPLACE(expr, pattern, replacement)`|Replaces all occurrences of regular expression `pattern` within `expr` with `replacement`. The replacement string may refer to capture groups using `$1`, `$2`, etc. The pattern may match anywhere inside `expr`; if you want to match the entire string instead, use the `^` and `$` markers at the start and end of your pattern.| -|`CONTAINS_STRING(expr, str)`|Returns true if the `str` is a substring of `expr`.| -|`ICONTAINS_STRING(expr, str)`|Returns true if the `str` is a substring of `expr`. The match is case-insensitive.| |`REPLACE(expr, pattern, replacement)`|Replaces pattern with replacement in `expr`, and returns the result.| +|`REPEAT(expr, [N])`|Repeats `expr` N times.| +|`REVERSE(expr)`|Reverses `expr`.| +|`STRING_FORMAT(pattern, [args...])`|Returns a string formatted in the manner of Java's [String.format](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#format-java.lang.String-java.lang.Object...-).| |`STRPOS(haystack, needle)`|Returns the index of `needle` within `haystack`, with indexes starting from 1. If `needle` is not found, returns 0.| |`SUBSTRING(expr, index, [length])`|Returns a substring of `expr` starting at index, with a max length, both measured in UTF-16 code units.| -|`RIGHT(expr, [length])`|Returns the rightmost length characters from `expr`.| -|`LEFT(expr, [length])`|Returns the leftmost length characters from `expr`.| |`SUBSTR(expr, index, [length])`|Alias for `SUBSTRING`.| -|`TRIM([BOTH `|` LEADING `|` TRAILING] [chars FROM] expr)`|Returns `expr` with characters removed from the leading, trailing, or both ends of "expr" if they are in "chars". If "chars" is not provided, it defaults to " " (a space). If the directional argument is not provided, it defaults to "BOTH".| +|`TRIM([BOTH `|` LEADING `|` TRAILING] [chars FROM] expr)`|Returns `expr` with characters removed from the leading, trailing, or both ends of `expr` if they are in `chars`. If `chars` is not provided, it defaults to `''` (a space). If the directional argument is not provided, it defaults to `BOTH`.| |`BTRIM(expr, [chars])`|Alternate form of `TRIM(BOTH chars FROM expr)`.| |`LTRIM(expr, [chars])`|Alternate form of `TRIM(LEADING chars FROM expr)`.| |`RTRIM(expr, [chars])`|Alternate form of `TRIM(TRAILING chars FROM expr)`.| -|`REVERSE(expr)`|Reverses `expr`.| -|`REPEAT(expr, [N])`|Repeats `expr` N times| -|`LPAD(expr, length, [chars])`|Returns a string of `length` from `expr` left-padded with `chars`. If `length` is shorter than the length of `expr`, the result is `expr` which is truncated to `length`. The result will be null if either `expr` or `chars` is null. If `chars` is an empty string, no padding is added, however `expr` may be trimmed if necessary.| -|`RPAD(expr, length, [chars])`|Returns a string of `length` from `expr` right-padded with `chars`. If `length` is shorter than the length of `expr`, the result is `expr` which is truncated to `length`. The result will be null if either `expr` or `chars` is null. If `chars` is an empty string, no padding is added, however `expr` may be trimmed if necessary.| ## Date and time functions @@ -268,11 +269,46 @@ The [DataSketches extension](../development/extensions-core/datasketches-extensi |Function|Notes| |--------|-----| -|`CAST(value AS TYPE)`|Cast value to another type. See [Data types](sql-data-types.md) for details about how Druid SQL handles CAST.| +|`BLOOM_FILTER_TEST(expr, serialized-filter)`|Returns true if the value of `expr` is contained in the base64-serialized Bloom filter. See the [Bloom filter extension](../development/extensions-core/bloom-filter.md) documentation for additional details. See the [`BLOOM_FILTER` function](sql-aggregations.md) for computing Bloom filters.| |`CASE expr WHEN value1 THEN result1 \[ WHEN value2 THEN result2 ... \] \[ ELSE resultN \] END`|Simple CASE.| |`CASE WHEN boolean_expr1 THEN result1 \[ WHEN boolean_expr2 THEN result2 ... \] \[ ELSE resultN \] END`|Searched CASE.| -|`NULLIF(value1, value2)`|Returns NULL if value1 and value2 match, else returns value1.| +|`CAST(value AS TYPE)`|Cast value to another type. See [Data types](sql-data-types.md) for details about how Druid SQL handles CAST.| |`COALESCE(value1, value2, ...)`|Returns the first value that is neither NULL nor empty string.| +|`DECODE_BASE64_COMPLEX(expr1, expr2)`| Decodes complex expressions represented as literals, where `expr1` is a string literal with a valid [complex type name](#complex-type-expressions) and `expr2` is a base64-encoded string that contains a serialized value of the type defined in `expr1`.| +|`NULLIF(value1, value2)`|Returns NULL if `value1` and `value2` match, else returns `value1`.| |`NVL(value1, value2)`|Returns `value1` if `value1` is not null, otherwise `value2`.| -|`BLOOM_FILTER_TEST(expr, serialized-filter)`|Returns true if the value of `expr` is contained in the Base64-serialized Bloom filter. See the [Bloom filter extension](../development/extensions-core/bloom-filter.md) documentation for additional details. See the [`BLOOM_FILTER` function](sql-aggregations.md) for computing Bloom filters.| +### Complex type expressions + +The following is a non-exhaustive list of complex type expressions supported by Druid SQL: + +Built-in: + * hyperUnique + * serializablePairLongString + +Bloom filter: + * bloom + +DataSketches: + * arrayOfDoublesSketch + * HLLSketch + * KllDoublesSketch + * KllFloatsSketch + * quantilesDoublesSketch + * thetaSketch + +Histogram: + * approximateHistogram + * fixedBucketsHistogram + +Stats: + * variance + +Compressed big decimal: + * compressedBigDecimal + +Moment sketch: + * momentSketch + +T-digest sketch: + * tDigestSketch \ No newline at end of file From 8c2c1955253bbbab3dec4737a7872505dfdf3015 Mon Sep 17 00:00:00 2001 From: Katya Macedo Date: Fri, 1 Dec 2023 10:29:35 -0600 Subject: [PATCH 2/5] Update function description --- docs/querying/sql-functions.md | 7 ++++--- docs/querying/sql-scalar.md | 8 ++++---- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/docs/querying/sql-functions.md b/docs/querying/sql-functions.md index 450f582140a7..e17c51233243 100644 --- a/docs/querying/sql-functions.md +++ b/docs/querying/sql-functions.md @@ -506,11 +506,11 @@ Rounds down a timestamp by a given time unit. ## DECODE_BASE64_COMPLEX -`DECODE_BASE64_COMPLEX(expr1, expr2)` +`DECODE_BASE64_COMPLEX(dataType, expr)` **Function type:** [Scalar, other](sql-scalar.md#other-scalar-functions) -Decodes complex expressions represented as literals, where `expr1` is a string literal with a valid [complex type name](sql-scalar.md#complex-type-expressions) and `expr2` is a base64-encoded string that contains a serialized value of the type defined in `expr1`. +Decodes a Base64-encoded string into a [complex type](sql-scalar.md#complex-type-names), where `dataType` represents the complex type and `expr` is the Base64-encoded string to decode. ## DECODE_BASE64_UTF8 @@ -518,7 +518,8 @@ Decodes complex expressions represented as literals, where `expr1` is a string l **Function type:** [Scalar, string](sql-scalar.md#string-functions) -Decodes base64-encoded strings into UTF-8 encoded strings. + +Decodes a Base64-encoded string into a UTF-8 encoded string. ## DEGREES diff --git a/docs/querying/sql-scalar.md b/docs/querying/sql-scalar.md index 8bd55ae4cb70..1e83b3fb8b93 100644 --- a/docs/querying/sql-scalar.md +++ b/docs/querying/sql-scalar.md @@ -95,7 +95,7 @@ String functions accept strings, and return a type appropriate to the function. |`TEXTCAT(expr, expr)`|Two argument version of `CONCAT`.| |`CONTAINS_STRING(expr, str)`|Returns true if the `str` is a substring of `expr`.| |`ICONTAINS_STRING(expr, str)`|Returns true if the `str` is a substring of `expr`. The match is case-insensitive.| -|`DECODE_BASE64_UTF8(expr)`|Decodes base64-encoded strings into UTF-8 encoded strings.| +|`DECODE_BASE64_UTF8(expr)`|Decodes a Base64-encoded string into a UTF-8 encoded string.| |`LEFT(expr, [length])`|Returns the leftmost length characters from `expr`.| |`RIGHT(expr, [length])`|Returns the rightmost length characters from `expr`.| |`LENGTH(expr)`|Length of `expr` in UTF-16 code units.| @@ -274,13 +274,13 @@ The [DataSketches extension](../development/extensions-core/datasketches-extensi |`CASE WHEN boolean_expr1 THEN result1 \[ WHEN boolean_expr2 THEN result2 ... \] \[ ELSE resultN \] END`|Searched CASE.| |`CAST(value AS TYPE)`|Cast value to another type. See [Data types](sql-data-types.md) for details about how Druid SQL handles CAST.| |`COALESCE(value1, value2, ...)`|Returns the first value that is neither NULL nor empty string.| -|`DECODE_BASE64_COMPLEX(expr1, expr2)`| Decodes complex expressions represented as literals, where `expr1` is a string literal with a valid [complex type name](#complex-type-expressions) and `expr2` is a base64-encoded string that contains a serialized value of the type defined in `expr1`.| +|`DECODE_BASE64_COMPLEX(dataType, expr)`| Decodes a Base64-encoded string into a [complex type](sql-scalar.md#complex-type-names), where `dataType` represents the complex type and `expr` is the Base64-encoded string to decode.| |`NULLIF(value1, value2)`|Returns NULL if `value1` and `value2` match, else returns `value1`.| |`NVL(value1, value2)`|Returns `value1` if `value1` is not null, otherwise `value2`.| -### Complex type expressions +### Complex type names -The following is a non-exhaustive list of complex type expressions supported by Druid SQL: +The `DECODE_BASE64_COMPLEX` function accepts the following complex type names: Built-in: * hyperUnique From 2d93ebe6dfe74bd6ee52185f2b905a528749846e Mon Sep 17 00:00:00 2001 From: Katya Macedo Date: Fri, 1 Dec 2023 14:11:00 -0600 Subject: [PATCH 3/5] Add code font, update list intro --- docs/querying/sql-scalar.md | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/docs/querying/sql-scalar.md b/docs/querying/sql-scalar.md index 1e83b3fb8b93..6c2d66927b7c 100644 --- a/docs/querying/sql-scalar.md +++ b/docs/querying/sql-scalar.md @@ -280,35 +280,35 @@ The [DataSketches extension](../development/extensions-core/datasketches-extensi ### Complex type names -The `DECODE_BASE64_COMPLEX` function accepts the following complex type names: +The `DECODE_BASE64_COMPLEX` function accepts the following complex type names as arguments for the `dataType` parameter: Built-in: - * hyperUnique - * serializablePairLongString + * `hyperUnique` + * `serializablePairLongString` Bloom filter: - * bloom + * `bloom` DataSketches: - * arrayOfDoublesSketch - * HLLSketch - * KllDoublesSketch - * KllFloatsSketch - * quantilesDoublesSketch - * thetaSketch + * `arrayOfDoublesSketch` + * `HLLSketch` + * `KllDoublesSketch` + * `KllFloatsSketch` + * `quantilesDoublesSketch` + * `thetaSketch` Histogram: - * approximateHistogram - * fixedBucketsHistogram + * `approximateHistogram` + * `fixedBucketsHistogram` Stats: - * variance + * `variance` Compressed big decimal: - * compressedBigDecimal + * `compressedBigDecimal` Moment sketch: - * momentSketch + * `momentSketch` T-digest sketch: - * tDigestSketch \ No newline at end of file + * `tDigestSketch` \ No newline at end of file From 6f928ce7d66c1cb8eaec7a6395bbd8e4fbec4b25 Mon Sep 17 00:00:00 2001 From: Katya Macedo Date: Fri, 8 Dec 2023 11:37:06 -0600 Subject: [PATCH 4/5] Update function description --- docs/querying/sql-functions.md | 2 +- docs/querying/sql-scalar.md | 39 ++-------------------------------- 2 files changed, 3 insertions(+), 38 deletions(-) diff --git a/docs/querying/sql-functions.md b/docs/querying/sql-functions.md index e17c51233243..8ffa3df02672 100644 --- a/docs/querying/sql-functions.md +++ b/docs/querying/sql-functions.md @@ -510,7 +510,7 @@ Rounds down a timestamp by a given time unit. **Function type:** [Scalar, other](sql-scalar.md#other-scalar-functions) -Decodes a Base64-encoded string into a [complex type](sql-scalar.md#complex-type-names), where `dataType` represents the complex type and `expr` is the Base64-encoded string to decode. +Decodes a Base64-encoded string into a complex data type, where `dataType` is the complex data type and `expr` is the Base64-encoded string to decode. ## DECODE_BASE64_UTF8 diff --git a/docs/querying/sql-scalar.md b/docs/querying/sql-scalar.md index 6c2d66927b7c..5c5538b06db8 100644 --- a/docs/querying/sql-scalar.md +++ b/docs/querying/sql-scalar.md @@ -274,41 +274,6 @@ The [DataSketches extension](../development/extensions-core/datasketches-extensi |`CASE WHEN boolean_expr1 THEN result1 \[ WHEN boolean_expr2 THEN result2 ... \] \[ ELSE resultN \] END`|Searched CASE.| |`CAST(value AS TYPE)`|Cast value to another type. See [Data types](sql-data-types.md) for details about how Druid SQL handles CAST.| |`COALESCE(value1, value2, ...)`|Returns the first value that is neither NULL nor empty string.| -|`DECODE_BASE64_COMPLEX(dataType, expr)`| Decodes a Base64-encoded string into a [complex type](sql-scalar.md#complex-type-names), where `dataType` represents the complex type and `expr` is the Base64-encoded string to decode.| +|`DECODE_BASE64_COMPLEX(dataType, expr)`| Decodes a Base64-encoded string into a complex data type, where `dataType` is the complex data type and `expr` is the Base64-encoded string to decode. The `hyperUnique` and `serializablePairLongString` data types are supported by default. You can enable support for the following complex data types by loading their extensions:
  • `druid-bloom-filter`: `bloom`
  • `druid-datasketches`: `arrayOfDoublesSketch`, `HLLSketch`, `KllDoublesSketch`, `KllFloatsSketch`, `quantilesDoublesSketch`, `thetaSketch`
  • `druid-histogram`: `approximateHistogram`, `fixedBucketsHistogram`
  • `druid-stats`: `variance`
  • `druid-compressed-big-decimal`: `compressedBigDecimal`
  • `druid-momentsketch`: `momentSketch`
  • `druid-tdigestsketch`: `tDigestSketch`
| |`NULLIF(value1, value2)`|Returns NULL if `value1` and `value2` match, else returns `value1`.| -|`NVL(value1, value2)`|Returns `value1` if `value1` is not null, otherwise `value2`.| - -### Complex type names - -The `DECODE_BASE64_COMPLEX` function accepts the following complex type names as arguments for the `dataType` parameter: - -Built-in: - * `hyperUnique` - * `serializablePairLongString` - -Bloom filter: - * `bloom` - -DataSketches: - * `arrayOfDoublesSketch` - * `HLLSketch` - * `KllDoublesSketch` - * `KllFloatsSketch` - * `quantilesDoublesSketch` - * `thetaSketch` - -Histogram: - * `approximateHistogram` - * `fixedBucketsHistogram` - -Stats: - * `variance` - -Compressed big decimal: - * `compressedBigDecimal` - -Moment sketch: - * `momentSketch` - -T-digest sketch: - * `tDigestSketch` \ No newline at end of file +|`NVL(value1, value2)`|Returns `value1` if `value1` is not null, otherwise `value2`.| \ No newline at end of file From c52411f9403f9b3ba7096903b492c05dbc1c776d Mon Sep 17 00:00:00 2001 From: Katya Macedo Date: Fri, 8 Dec 2023 16:44:14 -0600 Subject: [PATCH 5/5] Update spelling file --- website/.spelling | 2 ++ 1 file changed, 2 insertions(+) diff --git a/website/.spelling b/website/.spelling index 78e25b3285f1..82aaf47ca481 100644 --- a/website/.spelling +++ b/website/.spelling @@ -2186,6 +2186,8 @@ CHARACTER_LENGTH CURRENT_DATE CURRENT_TIMESTAMP DATE_TRUNC +DECODE_BASE64_COMPLEX +DECODE_BASE64_UTF8 DS_CDF DS_GET_QUANTILE DS_GET_QUANTILES