Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong type on json #102

Closed
Eprince-hub opened this issue Nov 16, 2022 · 22 comments
Closed

Wrong type on json #102

Eprince-hub opened this issue Nov 16, 2022 · 22 comments

Comments

@Eprince-hub
Copy link

Describe the bug
A clear and concise description of what the bug is.
SafeQL doesn't accept types for json, other than any. Using the json_agg aggregate function makes SafeQL think that the returning column must be any type and throws an error otherwise. The query below throws an incorrect type annotation if any other type is used for the returningJson other than any

type JsonAgg = {
  a: string;
  b: string;
  c: string;
};

type SafeqlRecordWithJson = {
  id: number;
  firstName: string | null;
  lastName: string | null;
  returningJson: JsonAgg;
};

export async function getAllTestSafeqlRecordWithJson() {
  return await sql<SafeqlRecordWithJson[]>`
      SELECT
        *,
      (
        SELECT json_agg(ingredients)::json
        FROM (
        SELECT
          *
        FROM
          try_safe_ql
        )ingredients
      ) AS returning_json
      FROM
        test_safeql;
          `;
}

To Reproduce
Steps to reproduce the behavior:

  1. Setup SafeQL
  2. Use the code above in .ts file

Expected behavior
A clear and concise description of what you expected to happen.
I expected that using these kinds of aggregate functions shouldn't cause an error, when i am using the type the query should return or that SafeQL would suggest types other than any

Screenshots
If applicable, add screenshots to help explain your problem.

Screenshot 2022-11-16 at 08 19 08

Screenshot 2022-11-16 at 08 18 01

Screenshot 2022-11-16 at 08 22 16

Desktop (please complete the following information):

  • OS: MAC OS
  • PostgreSQL version 14
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@Newbie012
Copy link
Collaborator

Thanks for the feedback! you can change that using overrides.types:

{
  "connections": {
    // ...
    "overrides": {
      "types": {
        "json": "JsonAgg"
      }
    }
  }
}

@Eprince-hub
Copy link
Author

@Newbie012

Hello, @Newbie012. I was exploring the suggestion you made above to solve the issue I am having with json type, and I find out that this mapping "json": "JsonAgg" doesn't represent the data type that the json_agg aggregate function is returning. Please, see the detailed explanation in this PR.

Please I want to confirm if I am correct or if I am missing something
upleveled/eslint-config-upleveled#239

@karlhorky
Copy link
Collaborator

Update from this comment by @Newbie012 : in the future, some JSON aggregation options will be supported (maybe also json_agg), once SafeQL moves away from RowDescription:

Currently, SafeQL mainly relies on PostgreSQL's RowDescription which explains the query structure without executing it.

PostgreSQL idiomatically infers the post_details row as json (which SafeQL output as any but is configurable).

Due to that, SafeQL is unable to infer the exact result of json_build_object.

I'll keep this open since I do plan eventually to support these kinds of limitations by moving away from RowDescription.

@karlhorky
Copy link
Collaborator

karlhorky commented Nov 30, 2023

@Newbie012 for the future design, one additional idea that I came across while writing my SQL tooling thread on Twitter/X was how @kristiandupont handles it in Kanel:

You can specify the type using a column comment. So if you have a table user with a column address that is JSON(B), you can run

COMMENT ON COLUMN user.address IS '@type:string ..and the rest of your comment, if any';

and now Kanel should turn it into a string. If you want to specify a type that needs to be imported, you can use @type(MyType, 'my-library', false, true, false)

Source: kristiandupont/kanel#429 (comment)

Sounds like it could be nice, to be able to specify the type information in the database 👍

@Newbie012
Copy link
Collaborator

Newbie012 commented Nov 30, 2023

Hmm.. that's an interesting approach!

Although I feel people won't like commenting code-specific comments on their database schema. Maybe I could implement it under a flag.

Another idea, would be to define them in a config. Something like:

{
  "connections": {
    "overrides": {
      "columns": {
        "person.address": "string",
        "person.geolocation": "import('@/types/Geo')"
      }
    }
  }
}

Would like to hear your thoughts on this 🙂

@karlhorky
Copy link
Collaborator

Personally I would lean towards having them in the database:

  1. "Database as the source of truth" as Kristian mentioned
  2. Keeping your types as close as possible to your data is a good thing. We often have multiple different generated JSON fields which should have different types based on what is being returned (sometimes ad-hoc or not a regular representation of a type). Keeping the type next to them would help a lot with creation and also maintenance, as they change over time.
  3. It would allow for portability of these types across different projects (to avoid having to configure SafeQL multiple times or share the ESLint config somehow)
  4. I think adding this metadata optionally for those who want to add JSON would be an ok thing to ask users to do

But if SafeQL doesn't want this approach and would rather have them defined separately as part of SafeQL config, then we would find a way to live with the feature elsewhere (as long as sufficiently flexible, allowing multiple different possible types returned from JSON fields).

@Newbie012
Copy link
Collaborator

Newbie012 commented Nov 30, 2023

I understand. My only concern here is that there's no "standard" of defining types in the database comments, which may lead to:

  • One tool could use the syntax ... IS '@type:string while a different tool would use ... IS '@type(string)'.
  • One database, multiple server packages - each one can define the type differently.

I could allow the option to find the type in the comments by regex while using @type(...) as the default.

@karlhorky
Copy link
Collaborator

Good point with standards and incompatibility across tooling.

Because there is no standard yet, gut feel says:

A) Use as much of existing standards / syntax as possible (eg. can it be TS syntax in there? similar to how ArkType does it)
B) Prefixing or identifying the comments somehow with a SafeQL-specific identifier eg. safeql or @safeql, to reduce the chances other tools will try to parse the comments

@kristiandupont
Copy link

kristiandupont commented Nov 30, 2023

For what it's worth, I wrote a parser with Peggy. While it's not a standard, the same syntax is also used by PostGraphile. (Note: that's tagging in general, not the @type tag specifically).

I think tagging could become a semi-standard if more libraries adopt it so I would be very happy to contribute in that regard. As for types, it's a bit more difficult because even if we agree that this is Postgres-only, other people might want types for other languages where the syntax would inevitably have to be different.

@Newbie012
Copy link
Collaborator

Thanks for sharing that @kristiandupont.

It seems like the parser is having hard time catching more complex terms:

import { parse } from "tagged-comment-parser";

const result = parse("@type:number @literal:'a'|'b' @object:{foo:number}");

But when I think about it, it doesn't have to. It could return a simple string and then SafeQL could treat it as a "pg" type, which could be transformed to any TypeScript type via overrides.types:

COMMENT ON COLUMN "user".address IS '@type:geo ...' 
{
  "connections": {
    "overrides": {
      "types": {
        "geo": "Geo"
      }
    }
  }
}

Since SafeQL should be used only with TypeScript & PostgreSQL, I'm less worried about conflicts with other languages. My issue is there can be potentially too much clutter when multiple clients access a single database.

@kristiandupont
Copy link

There are some shortcomings for sure, but not necessarily intentional. I am happy to expand on it, I've only been using it for personal stuff so far. The only criterium I have is that tags should be able to live side by side with a "regular" comment.

@karlhorky
Copy link
Collaborator

@Newbie012 should we rename this issue and reopen to track progress on this?

@Newbie012
Copy link
Collaborator

It seems more logical to keep this issue closed and open a new one since the original issue doesn't relate to the proposed feature. Using json_agg translates to a json type, which is different from a JSONB column that could correspond to a diverse type.

@karlhorky
Copy link
Collaborator

Hm, would this COMMENT solution not also work for json_agg columns?

It seems more logical to keep this issue closed and open a new one

Can definitely do this, but maybe json_agg could be one of the points for investigation + implementation in the new issue?

Proper json_agg types continues to be one of the largest challenges of us + students when using SafeQL.

@Newbie012
Copy link
Collaborator

It won't. json_agg is a function, not a type. Meaning, json_agg can accept any element. Not just relation).

As previously detailed here, SafeQL leans on PostgresSQL's RowDescription message for column metadata if available.

While scanning the query's AST for nullability checks, theoretically, I could simultaneously seek JSON/B expressions and attempt to deduce its type. Simple cases like json_agg(tbl) might be inferable, but this is purely speculative. Regardless, an issue reconciling this and #179 seems necessary, as they both stem from the same issue.

@karlhorky
Copy link
Collaborator

json_agg is a function, not a type

Yeah, I think I'm understanding that part.

What I meant was something like this:

SELECT
  cohorts.id,
  cohorts.title,
  (
    SELECT json_agg(cohort_events)
    FROM
      (
        SELECT events.name FROM
          cohorts_events
          INNER JOIN events ON cohorts_events.event_id = events.id
        WHERE cohorts_events.cohort_id = cohorts.id
      ) AS cohort_events
  ) AS events COMMENT IS '@type:{name: string}[]'
FROM
  cohorts;

For the calculated field with the subquery events, SafeQL would annotate this as {name: string}[]

But looking around, I am not sure that this syntax exists in PostgreSQL.

Some alternative syntaxes:

If SafeQL can read / parse SQL comments and connect it to a field:

SELECT
  cohorts.id,
  cohorts.title,
  (
    SELECT json_agg(cohort_events)
    FROM
      (
        SELECT events.name FROM
          cohorts_events
          INNER JOIN events ON cohorts_events.event_id = events.id
        WHERE cohorts_events.cohort_id = cohorts.id
      ) AS cohort_events
  ) AS events -- @type:{name: string}[]
FROM
  cohorts;

If TypeScript comments / functions could be connected to the field:

SELECT
  cohorts.id,
  cohorts.title,
  (
    SELECT json_agg(cohort_events)
    FROM
      (
        SELECT events.name FROM
          cohorts_events
          INNER JOIN events ON cohorts_events.event_id = events.id
        WHERE cohorts_events.cohort_id = cohorts.id
      ) AS cohort_events
  ) AS events ${/* @type:{name: string}[] */ sql``}
FROM
  cohorts;

or

SELECT
  cohorts.id,
  cohorts.title,
  (
    SELECT json_agg(cohort_events)
    FROM
      (
        SELECT events.name FROM
          cohorts_events
          INNER JOIN events ON cohorts_events.event_id = events.id
        WHERE cohorts_events.cohort_id = cohorts.id
      ) AS cohort_events
  ) AS events ${type('{name: string}[]')}
FROM
  cohorts;

@karlhorky
Copy link
Collaborator

karlhorky commented Dec 3, 2023

While scanning the query's AST for nullability checks, theoretically, I could simultaneously seek JSON/B expressions and attempt to deduce its type. Simple cases like json_agg(tbl) might be inferable, but this is purely speculative.

Oh, just thought of something else - not sure if this is the same idea as what you mentioned here:

Running subqueries recursively when they are encountered in the query, to find the type of the data, if the column is type json. (or parsing the JSON if it's a literal value)

@Newbie012
Copy link
Collaborator

It seems like pg comments aren't parsed at the moment.

About your suggestion - While I understand how it somewhat solves your issue, It doesn't feel like a future-proof code. Queries are volatile, and comments may get disconnected from the logic. I think gradually adding support for json type inference would be more promising.

@karlhorky
Copy link
Collaborator

Ok, thanks for the note. I'll open an issue and link to here and the other issue and we can see where this goes 👍

@karlhorky karlhorky mentioned this issue Dec 3, 2023
3 tasks
@karlhorky
Copy link
Collaborator

karlhorky commented Dec 3, 2023

Created a new issue here, feel free to edit as needed:

If you don't have objections, issue #179 could be closed with a note that #190 supersedes it.

@Newbie012
Copy link
Collaborator

Update - #190 (comment)

@karlhorky
Copy link
Collaborator

Thanks for the feedback! you can change that using overrides.types:

{
  "connections": {
    // ...
    "overrides": {
      "types": {
        "json": "JsonAgg"
      }
    }
  }
}

If you're using options like overrides.types.json or overrides.types.jsonb and are having troubles with the JsonAgg type only referring to a single TypeScript type, creating multiple files offers a way to work around this:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants