Suggest DECIMAL cql type to be returned as string #1608

hemidactylus · 2024-10-25T15:32:42Z

The rounding issues that essentially led to creating the DECIMAL cql type would dictate such columns to be returned as strings, to let callers/users/clients to do the right thing. Otherwise, unexpected results may occur.

Principle

In binary floating-point 3.3 - 2.2 - 1.1 != 0, because of rounding issues during the calculation. Decimal holds a special representation for numbers with decimals that avoids that.

In pure CQL

This table has two columns (decimal, float) populated the same, which demonstrate this difference:

CREATE TABLE default_keyspace.tmp_d (
    a int PRIMARY KEY,
    b decimal,
    c float
);

cassandra@cqlsh:default_keyspace> select * from tmp_d  ;

 a  | b    | c
----+------+------
 -1 | -1.1 | -1.1
 -2 | -2.2 | -2.2
  3 |  3.3 |  3.3

The DECIMAL column:

cassandra@cqlsh:default_keyspace> select sum(b) from tmp_d;

 system.sum(b)
---------------
           0.0

The FLOAT column:

cassandra@cqlsh:default_keyspace> select sum(c) from tmp_d;

 system.sum(c)
---------------
   -1.1921e-07

Data API

When reading this table with the Data API (dockerized 1.0.18 to be precise), this is what I see:

{
    "data": {
        "documents": [
            {
                "a": -1,
                "b": -1.1,
                "c": -1.1
            },
            {
                "a": -2,
                "b": -2.2,
                "c": -2.2
            },
            {
                "a": 3,
                "b": 3.3,
                "c": 3.3
            }
        ],
        "nextPageState": null
    },
    "status": {
        "warnings": [
            "Zero filters were provided in the ..."
        ]
    }
}

As a result, there is a lossy math when the original intent of the column type was precisely to avoid that.

The text was updated successfully, but these errors were encountered:

tatu-at-datastax · 2024-10-25T16:29:19Z

I do not follow: since DECIMAL is specifically 10-based value it should have exact textual representation.

But even with 2-based (binary) floating-point I am not sure I get the point: yes, number representation as 10-based text may need to be rounded -- but would of JSON String make any difference? How would that help?

tatu-at-datastax · 2024-10-25T16:50:59Z

Ok, based on discussions this is concern wrt JSON handling differing between JSON Numbers and JSON Strings -- so if server-side was to use 2-based FPs (float, double) for all numbers, then use of JSON Strings would avoid the issue.

As thing are, Data API uses Java BigDecimal for all JSON numbers with fractions so we should be good -- but if testing shows problems, we def need to address.

vkarpov15 · 2024-10-28T21:25:41Z

My 2 cents: I think returning decimals as numbers is the right way to go. There shouldn't be any loss of precision since JSON serialization isn't doing any arithmetic. And it seems semantically cleaner for numbers to end up as numbers in JSON.

However, serializing decimals as strings wouldn't be unprecedented. For example, MongoDB's JavaScript BSON serializer only supports Decimal128.fromString(), there's no way to convert a JavaScript number (binary floating point) to a Decimal128 without going through a string first.

amorton · 2024-11-04T07:55:41Z

Reading through...

In pure CQL

Not sure how this exactly relates to the issue of encoding decimal, yes there is a reason for having floats and decimals.

I will note that the Decimal and Float sums above are the same, depending on the number of decimal places and this is why we have the decimal

In [5]: s = "-9.9362e-10"

In [6]: f = float(s)

In [8]: print(f"10 decimal places: {f:.10f}")
10 decimal places: -0.0000000010

In [12]: print(f"1 decimal places: {f:.1f}")
1 decimal places: -0.0

# using new rounding https://peps.python.org/pep-0682/
In [20]:  print(f"1 decimal places: {f:z.1f}")
1 decimal places: 0.0

In [11]: float("-0.0") == float("0")
Out[11]: True

In [14]: float("-0.0") == int("0")
Out[14]: True

# false if we use the full exponent 
In [15]: float(s) == int("0")
Out[15]: False

Data API

I am not following the example when it says "As a result, there is a lossy math when the original intent of the column type was precisely to avoid that." the values returned by the Data API are exactly the same as those returned by CQL.

For example, MongoDB's JavaScript BSON serializer only supports Decimal128.fromString(), there's no way to convert a JavaScript number (binary floating point) to a Decimal128 without going through a string first.

This is the mongo docs for this https://github.com/mongodb/specifications/blob/master/source/bson-decimal128/decimal128.md

and https://www.mongodb.com/docs/manual/reference/mongodb-extended-json/#mongodb-extended-json--v2-

Thoughts...

not decisions just things to be considered....

JSON view on numbers

As the JSON - Data Types page says....

Numbers in JSON are agnostic with regard to their representation within programming languages. While this allows for numbers of arbitrary precision to be serialized, it may lead to portability issues... There is no inherent precision loss in serializing a machine-level binary representation of a floating-point number (like binary64) into a human-readable decimal representation (like numbers in JSON) and back since there exist published algorithms to do this exactly and optimally.

My view on this is the JSON protocol says it has a way to encode an number that is not infinite or NaN. And then says it is up to the parsing code to alert the user if there is an issue with the decoding such as if the code cannot represent the number.

Mongo DB view on Decimal

If a decimal is declared, they are always using an EJSON value { "$numberDecimal": "<number>" } this is the same for their Canonical and Relaxed formats for their JSON formatting. The BSON formatting is "IEEE 754-2008 128-bit decimal floating point " , this is what they are sending over the wire and remember BSON as a little richer data types.

And if I read the spec correctly they are telling clients to not round trip into the native decimal types.

Telephone number problem

What happens to the classic telephone number problem, the string "012345" is a string of digits not a number.

Relaxed Vs Strict

We are building relaxed representation first, and the idea is to be as basic a JSON as we can. Any read will return return the schema, if a client wants to go beyond handling a number as JSOn number it can use the schema returned to know what is a decimal. The parsing in Python and Node.js and other libraries allows for special handling of fields and floats to handle these cases.

tatu-at-datastax · 2024-11-08T19:50:11Z

Related: filed #1654 wrt "other direction" for decimal type.

amorton · 2024-12-11T02:11:20Z

moving to on hold , still think we should use JSON for the simple encoding and can then do this for the strict encoding format

vkarpov15 · 2025-01-02T14:07:24Z

I thought a little bit more about this in the context of datastax/astra-db-ts#90 and I think I see the reason for returning DECIMAL, as well as potentially VARINT and LONG, as strings. Built-in JSON parsers, like JavaScript's, are often limited by the language's built-in number limits, so values that cannot safely fit in numbers may lose precision.

For example, 9223372036854775807 is max long value in Java. Consider what happens when JavaScript's JSON.parse() tries to handle that value as a number in the following example. JavaScript ends up parsing that value into 9223372036854776000 as a JavaScript number, which is an incorrect result.

$ node
Welcome to Node.js v20.10.0.
Type ".help" for more information.
> const str = '9,223,372,036,854,775,807'.replace(/,/g, '')
undefined
> str
'9223372036854775807'
> JSON.parse(`{"value":${str}}`)
{ value: 9223372036854776000 }
> `{"value":${str}}`
'{"value":9223372036854775807}'
>

This puts clients in a potentially tricky position of needing to implement customized JSON parsing.

tatu-at-datastax mentioned this issue Nov 5, 2024

Add support for String-valued EJSON-wrapper for decimal for input? #1654

Open

amorton added the Feature Agreed feature yet to be implemented label Nov 6, 2024

amorton added the On Hold Not ready for active development label Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggest DECIMAL cql type to be returned as string #1608

Suggest DECIMAL cql type to be returned as string #1608

hemidactylus commented Oct 25, 2024 •

edited by tatu-at-datastax

Loading

tatu-at-datastax commented Oct 25, 2024

tatu-at-datastax commented Oct 25, 2024

vkarpov15 commented Oct 28, 2024

amorton commented Nov 4, 2024

tatu-at-datastax commented Nov 8, 2024

amorton commented Dec 11, 2024

vkarpov15 commented Jan 2, 2025

Suggest DECIMAL cql type to be returned as string #1608

Suggest DECIMAL cql type to be returned as string #1608

Comments

hemidactylus commented Oct 25, 2024 • edited by tatu-at-datastax Loading

Principle

In pure CQL

Data API

tatu-at-datastax commented Oct 25, 2024

tatu-at-datastax commented Oct 25, 2024

vkarpov15 commented Oct 28, 2024

amorton commented Nov 4, 2024

Thoughts...

JSON view on numbers

Mongo DB view on Decimal

Telephone number problem

Relaxed Vs Strict

tatu-at-datastax commented Nov 8, 2024

amorton commented Dec 11, 2024

vkarpov15 commented Jan 2, 2025

hemidactylus commented Oct 25, 2024 •

edited by tatu-at-datastax

Loading