Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.0.0 #90

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open

v2.0.0 #90

wants to merge 21 commits into from

Conversation

toptobes
Copy link
Collaborator

No description provided.

toptobes and others added 20 commits September 29, 2024 20:26
* remove deprecated vector/vectorize parameters

* remove bulkWrite

* remove deleteAll

* removed namespace terminology

* removed db.collections()

* removed client.db(id, region)

* update api report
* some experimental table typing

* dark magic

* added default type for table

* made InferTableSchema more flexible

* moved table.ts into its own folder (mirrors collections)

* more type errors

* broke up table.ts file in to its proper structure

* more work on tavles idk

* start work on common command impls class

* fixed couple rebasing errors

* moved all collection functions to generic internal CommandsImpls object

* fixed all of the bugs I introduced in the previous commit :)

* more implmeennation work

* added table methods for db as well

* added types (but not impl) for alterTable

* add countRows

* add more missing table functions

* added various datatypes (dates & ip addrs)

* update build report

* update build report

* some datatype tweaks (mostly for InetAddress)

* some type operations when translating cql types
* various cursor tweaks & fixes

* minor typing tweaks

* fixed bug with filter being potentially muitable

* made cursors immutable

* some tests not all idk
* started work overhauling validation logic

* so much validation

* some folder resturcuting

* loggign hierarchy

* basic logging impl

* implement warning events

* tests for logging and such
* documented table-schema file

* tsdoc work

* remove checkExists

* ser/des work

* playground script

* started adding custom inspects and work on datetimes

* made _httpClient fully public

* work on DataAPIVector stuff

* some resturcturing

* made all Promise<true> methods just reutrn Promise<void>

* removed rackstackk hack

* cqlblob type

* additionalHeaders

* started documenting table-related stuff

* formatting for events logging

* documentation for logging

* document collection class

* created CursorError

* set up test suite for table tests

* move dropIndex to db level

* lot of work on bignumbers hack...

* remove CollectionNotFoundError

* added table.deifnition()

* super basic insertOne test

* basic findone tests

* toomanyrowstocount error

* start documenting serdes

* split cumulative errors + some more bignumber serdes work

* changed $PrimaryKeyType to be a string + some test typing fixes

* added sparse data support
* make DataAPIDbAdmin keyspace options no longer extend AdminBlockingOptions

* refactor raw db info into more workable objects

* more unified naming convention of admin interfaces

* bit of name tweaking

* some admin info itnerface tweaking

* light, temp documentation
* split cursor classes

* little bit of cleanup

* moved cursors test to documents/collections

* unit tests for split cursors

* integration tests for split cursors
* timeouts overhaul

* keyspace impl & tests

* more tests & tweaks

* fix timeout/sort bgus

* refined WithTimeout types

* createCollection custom timeout impl

* more docs and stuff
* a bunch of tests work

* couple minor test fixes
* more intuitive naming for events/logging stuff

* formatting + timestamps for log messages

* dropIndex ifExists

* sourceModel

* createindex options resturcuting

* split filter & update types

* remove cql from datatypes names

* timeout for cursor.toArray() & coll.distinct()

* added class names for admin event name sourcse

* remove "spawn" from spawn type names
* updated serdes

* changed tokne provider a bit

* minor linting fix

* fixed couple import/typing issues + tables readme

* add shorthand datatype functions

* update readme w/ shorthand datatype fns

* marked mroe internal values

* removed deeppartial & strict filters/sorts/projs

* tiny bit of renaming
* updated serdes

* add shorthand datatype functions

* update readme w/ shorthand datatype fns

* marked mroe internal values

* start documentation of tables

* added listIndexes

* some altertable fixes

* bump min node version to v18+
* advanced typings for tables/colls; typings for includeSimialrity

* fixed tables typing test file

* update readme
* switch to codec-based ser/des system

* camel snake case interop

* example and many bug fixes and tweaks and stuff idk lol

* added serdes path matching & class-mapping example
* some fixes/work

* some ttyping work

* added tsdoc everyuwhere for the msot part

* reset example astardbts versions

* few more tests

* minor uipdates to examples
* update uuid stuff + start datatypes tests

* reexport bignumber

* some datatype tests
* unficiation of codec types

* snakeCaseInterop => keyTransformer

* some unit tests on ser-des options
@vkarpov15
Copy link
Contributor

Overall looks reasonable, just had a couple of suggestions/thoughts:

  1. I thought the plan was to make httpClient public on client and db?
  2. It would be great to improve error message when mixing up collections and tables. For example, if you try to insert a doc using table logic into a collection, you end up with the following error message:
    TypeError: Cannot convert undefined or null to object
      at Function.entries (<anonymous>)
      at TableSerDes.adaptDesCtx (node_modules/@datastax/astra-db-ts/dist/documents/tables/ser-des/ser-des.js:38:53)
      at TableSerDes.deserializeRecord (node_modules/@datastax/astra-db-ts/dist/lib/api/ser-des/ser-des.js:51:26)
      at CommandImpls.insertOne (node_modules/@datastax/astra-db-ts/dist/documents/commands/command-impls.js:50:38)

@toptobes
Copy link
Collaborator Author

toptobes commented Dec 22, 2024

I thought the plan was to make httpClient public on client and db?

DataAPIClient doesn't actually have an HttpClient, but it is public on Db, Table, Collection, and the three *Admin classes as ._httpClient.

The _ prefix remains to keep it out of autocomplete and to signify that it's really not a common property you should be accessing and working with.

Anyways, you shouldn't need the Feature-Flag hack anymore anyways since tables are enabled by default now (right?). But, if you did need any other Feature-Flag in the future, there's a new dbOptions.additionalHeaders options for specifying that sort of thing in the DataAPIClientOptions (or just additionalHeaders in DbOptions)

For example, if you try to insert a doc using table logic into a collection, you end up with the following error message...

Oh yeah I forgot about this case, I'll add some checks that'll basically say "hey you inserted a collection doc into a table and it succeeded but we failed to parse the response since you're using the table class", or, for finds, "hey you tried to find on a table but it was actually a collection so we failed".

Thanks for pointing this one out 👍

If you have any other thoughts/suggestions, no matter how nitpicky or small, please do share

* start work on more docs md pages

* created colls/tables datatype cheatsheet

* did collections dts

* readme work

* ran npm audit fix

* minor tweaks to DATATYPES.md

* bnunch of tsdoc for create-table related types

* tsdoc for DataAPILoggingDefaults

* update examples to use @next version of astra-db-ts

* documented list-tables
@vkarpov15
Copy link
Contributor

A couple of other issues I ran into:

  1. DataAPITimestamp's SerializeForCollection appears to serialize DataAPITimestamps as { $date: string } rather than { $date: number }, which leads to DataAPIResponseError: Bad JSON Extension value: Date ($date) needs to have NUMBER value, has STRING when trying to save a DataAPITimestamp into a collection.
  2. Why doesn't table serialization automatically handle converting JavaScript dates into timestamps?

@toptobes
Copy link
Collaborator Author

  1. Ope, thanks for pointing that out, definite bug
  2. It uses DataAPITimestamp for symmetry with the other dates/durations; also ensures people realise Datedate. That being said, I suppose I could still allow you to insert as Dates, but you'd still read them back as DataAPITimestamps. Also if you really want to use Dates instead of Timestamps for both r/w, there's a codec for that in TableCodecs, but it's undocumented atm since the ser/des options are still in beta

@vkarpov15
Copy link
Contributor

Re: DataAPITimestamp point (2), I would recommend automatically serializing JS dates to DataAPITimestamp because they're fundamentally the same type: integer containing milliseconds since epoch. Not much reason to make a distinction between the two. Is there even a reason to have a separate DataAPITimestamp type?

Right now sending a JS Date to insertOne() results in a "Error trying to convert to targetCQLType TIMESTAMP from value.class java.util.HashMap, value {}. Root cause: no codec matching value type." error, so I think the two options are (1) serialize Dates automatically, (2) throw a more readable error indicating that Dates are not serialized automatically. The current error is not great because users don't know why dates are getting serialized into hashmaps.

Some more notes:

  1. It doesn't look like I can import TableIndexDescriptor, and without that I can't use listIndexes() because of error Return type of public method from exported class has or is using name 'TableIndexDescriptor' from external module "node_modules/@datastax/astra-db-ts/dist/astra-db-ts" but cannot be named.
  2. There's a createIndex() method on the Table class, why is there no dropIndex()? Looks like there's just a dropTableIndex() on db, but that seems inconsistent.
  3. No BigInt support for tables? Looks like enableBigNumbers is only allowed on collections. Also, why is BigInt support behind a flag?

@toptobes
Copy link
Collaborator Author

toptobes commented Dec 29, 2024

Regarding #1 & #2, the main reason for the separate DataAPITimestamp is consistency; otherwise, there would be both Date and DataAPIDate classes, which may be a bit confusing/unintuitive.

I'm leaning towards throwing a readable error (agree that it's currently not readable) that says "either use a DataAPITimestamp, or enable the codec to use Dates instead".

I'm on the fence about serializing dates but still reading them back as a DataAPITimestamp, as it could be somewhat surprising behavior, but at the same time, we allow the same with vectors and such, so Idk 🤷

It doesn't look like I can import TableIndexDescriptor

Oops, bug, will fix in next preview release.

If you really want a quick fix for implementation/testing purposes, you can use

type TableIndexDescriptor = Awaited<ReturnType<InstanceType<typeof Table>['listIndexes']>> extends (infer T extends object)[] ? T : never;

There's a createIndex() method on the Table class, why is there no dropIndex()

This came from the Data API team themselves. On the Data API as well, dropIndex is a keyspace operation, while the create*Index operations are all table operations.

The idea is to make it clearer that index names exist within the keyspace, and not the table, so it reduces the chance of something dropping a keyspace that they didn't mean to.

This one's out of my control 🤷

No BigInt support for tables?

No, there is complete bigint (and BigNumber) support for tables; it's automatically enabled if you're using a table with varint or decimal columns (at the cost of some performance)

Looks like enableBigNumbers is only allowed on collections. Also, why is BigInt support behind a flag?

Actually, when someone uses bigints or BigNumbers, I need to fall back to the json-bigint json library, which is a drop-in replacement for the native JSON library, except it has actual bigint/BigNumber support.

However, this library is written purely in JS, unlike the native JSON module, which is typically implemented in highly-optimized C++, and is therefore decently slower. Plus, it sets a null prototype for each JS object, so I need to recursively fix the proto for each parsed object and its nested objects as well

TL;DR: it's a bit of a mess because the native JSON module doesn't support big numbers, and the Data API doesn't want to allow big numbers to be read as, or even written as, strings; instead, they're insistent on having bignumbers be raw JSON number literals, since it's technically allowed in the spec.

Can expand if necessary, but I can't really enable bignumbers by default for collections (I technically could for serialization, but not for deserialization; however, I'd rather have it explicitly on/off instead of having some half/half behavior).

I'll try to throw a readable error here as well for bignums on collections that don't have bignums enabled

@vkarpov15
Copy link
Contributor

Re: BigInts, I looked a little further into it, and here's where my misunderstanding is:

  1. BigInt deserialization is only enabled for decimal and varint, but not long. Ideally should also deserialize longs into bigints since max long is much larger than JavaScript's max safe integer
  2. No BigInt serialization for tables. json-bigint is only used for deserialization, but there's no code path that I see which uses json-bigint for serialization (tables or collections). That means the only way to serialize BigInts is to convert them to numbers first.

@toptobes
Copy link
Collaborator Author

toptobes commented Jan 2, 2025

BigInt deserialization is only enabled for decimal and varint, but not long.

oh you're right... I forgot about that case... I suppose bigint columns should always be using/returning bigints instead of numbers instead...

But there's no code path that I see which uses json-bigint for serialization (tables or collections)

No, there very much is. I can't say that the code for it is the cleanest since it's a pretty messy situation in the first place that had to be retrofitted since I didn't expect the Data API to go this route, but it exists.

This is in data-api-http-client.ts, in the executeCommand method:

const serialized = (info.bigNumsPresent)
  ? this.bigNumHack?.parser.stringify(info.command) // optional since it's not present in DataAPIDbAdmin usages. Not super happy with the loose invariant, but it is what it is
  : JSON.stringify(info.command);

Example:

> await table.insertOne({ text: '1', int: 1, varint: 1231231222132132131231231231231231231231232132133n })
{ insertedId: { text: '1', int: 1 } }
> await tfa
[
  {
    varint: 1231231222132132131231231231231231231231232132133n,
    int: 1,
    text: '1'
  }
]

@toptobes
Copy link
Collaborator Author

toptobes commented Jan 2, 2025

Also for what it's worth, I also strongly made the same argument for accepting and returning varint & decimal values as strings, but the Data API team decided against it since the JSON spec technically allows it, so they found it more "pure" that way 🤷. Python also suffers from the same issue, and his workaround ended up needing to be traumatizingly horrific.

though to be fair, I guess the issue still would've existed for bigint columns as well, as you pointed out

@toptobes
Copy link
Collaborator Author

toptobes commented Jan 2, 2025

Actually a known bug you may be running into is if you're using BigNumbers in tables, but using your own version of the library instead of the one reexported by astra-db-ts, the code won't pick up that it's a BigNumber (since instanceof BigNumber would be false since it's technically a different class)

I have a fix for this in a local branch already.

@toptobes toptobes force-pushed the KG-v2.0 branch 2 times, most recently from 25a764d to 8ee74e2 Compare January 3, 2025 18:26
@vkarpov15
Copy link
Contributor

One more point I noticed:

  1. Do you intend to add returnDocumentResponses support for insertMany()? Looks like that is still hardcoded to returnDocumentResponses: true under the hood.

@toptobes
Copy link
Collaborator Author

toptobes commented Jan 7, 2025

Do you intend to add returnDocumentResponses support for insertMany()? Looks like that is still hardcoded to returnDocumentResponses: true under the hood.

As far as I'm aware, the implementation still doesn't work if the vectorize provider fails; still waiting on the Data API to fix that before it can be safely exposed for general use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants