Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose libpg_query scanner to include comments #8

Open
rattrayalex opened this issue Apr 23, 2021 · 15 comments
Open

expose libpg_query scanner to include comments #8

rattrayalex opened this issue Apr 23, 2021 · 15 comments

Comments

@rattrayalex
Copy link

Hey there,

Neither this nor https://github.com/pyramation/pgsql-parser clarify whether the AST it produces include comments by default, and whether or not there is a way to include SQL comments.

Now that lib_pgquery 2.0 supports comment emission, that would be nice to document (and, if not implemented, implement). This would be useful for implementing deparsers/beautifiers like prettier-plugin-pg.

Thanks!

@pyramation
Copy link
Collaborator

Comments to my knowledge are not an AST node returned by lib_pgquery, can you provide a link?

From my understanding, only comments left in tact would be those in functions, as function bodies are treated as text.

@pyramation
Copy link
Collaborator

pyramation commented Apr 26, 2021

Also, @benjie has explored this with this underlying tool before via the old version of this parser, and from what I understand, he had to approach comments with another library. If there is new development in lib_pgquery it could open up a new possibility!

@rattrayalex
Copy link
Author

Perhaps I'm mistaken; what I saw was this in the pg_query 2.0 announcement blog post:

There is a new function available to access the Postgres scanner. This includes the location of comments in a query text.

But perhaps that doesn't include comment contents, I haven't dug into it.

@rattrayalex
Copy link
Author

Ah, looking at the libpg_query readme again it looks like they have an example showing a token stream, including comments, being emitted. Usually including a list of comments and their locations is enough for a pretty-printer (so long as the ast nodes have location information as well).

@benjie
Copy link

benjie commented Apr 26, 2021

Yes I’m very keen for lib pgquery 2 to be supported so I can try this out. I’m hoping that it also gives better location info for all the nodes; but if not even just using the scanner (which does) would be useful.

@pyramation
Copy link
Collaborator

I see it here yes https://pganalyze.com/blog/pg-query-2-0-postgres-query-parser

There is a new function available to access the Postgres scanner. This includes the location of comments in a query text. One could envision building a syntax highlighter based on this. Or extract comments from queries whilst ignoring comment-like tokens in a constant value.

So if this is the case, I've already integrated the latest 13-latest branch from libpg_query. Maybe @lfittl can shed light on where we can look to see information about comments?

@lfittl
Copy link

lfittl commented Apr 26, 2021

So if this is the case, I've already integrated the latest 13-latest branch from libpg_query. Maybe @lfittl can shed light on where we can look to see information about comments?

The important aspect here is that the comments are only part of the scanner output, not the parser output (since there is no parse tree representation of comments in Postgres).

In the ruby version, for reference:

[1] pry(main)> PgQuery.scan('SELECT 1 --comment')
=> [<PgQuery::ScanResult: version: 130002, tokens: [
<PgQuery::ScanToken: start: 0, end: 6, token: :SELECT, keyword_kind: :RESERVED_KEYWORD>,
<PgQuery::ScanToken: start: 7, end: 8, token: :ICONST, keyword_kind: :NO_KEYWORD>,
<PgQuery::ScanToken: start: 9, end: 18, token: :SQL_COMMENT, keyword_kind: :NO_KEYWORD>]>,
 []]

[3] pry(main)> PgQuery.parse('SELECT 1 --comment')
=> #<PgQuery::ParserResult:0x00007fea97305528
 @aliases=nil,
 @cte_names=nil,
 @query="SELECT 1 --comment",
 @tables=nil,
 @tree=
  <PgQuery::ParseResult: version: 130002, stmts: [<PgQuery::RawStmt: stmt: <PgQuery::Node: select_stmt: <PgQuery::SelectStmt: distinct_clause: [], target_list: [<PgQuery::Node: res_target: <PgQuery::ResTarget: name: "", indirection: [], val: <PgQuery::Node: a_const: <PgQuery::A_Const: val: <PgQuery::Node: integer: <PgQuery::Integer: ival: 1>>, location: 7>>, location: 7>>], from_clause: [], group_clause: [], window_clause: [], values_lists: [], sort_clause: [], limit_option: :LIMIT_OPTION_DEFAULT, locking_clause: [], op: :SETOP_NONE, all: false>>, stmt_location: 0, stmt_len: 0>]>,
 @warnings=[]>

@pyramation
Copy link
Collaborator

Thanks for the clarification @lfittl :)

Looks like we'd have to implement a connection via napi to the pg_query_scan C function. Here is where @ethanresnick had implemented the napi code https://github.com/pyramation/libpg-query-node/blob/master/src/async.cc for reference.

@benjie do you think if you had that type of raw scanner information, would it even be helpful? Of does the context of the AST location matter?

@benjie
Copy link

benjie commented Apr 26, 2021

If the parser includes enough location info then we can re-attach the comments from the scanner in the right place; it’s quite common to do this in tools like prettier. Last time I tried there wasn’t sufficient location info from the pg parser (a limitation of Postgres’ own parser, only the location info necessary for errors etc was present); but that was a fair while ago. The optimal situation would be if every AST node contained start and end (or length) information, either character offset or line/column.

@benjie
Copy link

benjie commented Apr 26, 2021

But honestly even if the parser doesn't suit, using the scanner to build a new parser is an option (and at least it's part of the job done!)

rattrayalex added a commit to rattrayalex/pg_query that referenced this issue Apr 27, 2021
@pyramation pyramation changed the title Document whether the output AST includes comments expose libpg_query scanner to include comments Apr 28, 2021
@pyramation
Copy link
Collaborator

pyramation commented Apr 28, 2021

For reference to see what it could be like to combine the scanner/parser:

Using the same query from scanner example:

SELECT 1 --comment

Tokens

[
<PgQuery::ScanToken: start: 0, end: 6, token: :SELECT, keyword_kind: :RESERVED_KEYWORD>,
<PgQuery::ScanToken: start: 7, end: 8, token: :ICONST, keyword_kind: :NO_KEYWORD>,
<PgQuery::ScanToken: start: 9, end: 18, token: :SQL_COMMENT, keyword_kind: :NO_KEYWORD>]>,
 []]

AST

[
  {
    "RawStmt": {
      "stmt": {
        "SelectStmt": {
          "targetList": [
            {
              "ResTarget": {
                "val": {
                  "A_Const": {
                    "val": {
                      "Integer": {
                        "ival": 1
                      }
                    },
                    "location": 7
                  }
                },
                "location": 7
              }
            }
          ],
          "limitOption": "LIMIT_OPTION_DEFAULT",
          "op": "SETOP_NONE"
        }
      }
    }
  }
]

I think it could be good to get a few more scanner examples and empirically look at the locations and try to see what the pattern can emerge. With this simple example, it seems that we only get location: 7 for the integer.

@benjie
Copy link

benjie commented Apr 28, 2021

Yeah that’s how it was before; the AST has insufficient information. But the scanner definitely opens up a new avenue; I always thought we’d need a custom parser to do what we’re trying to do, but building one atop Postgres’ own scanner would give me much greater confidence. Quite a few of the JS scanners seem to have issues scanning dollar-quoted strings, not to mention some of the more esoteric parts of postgres’ scanner.

@rattrayalex
Copy link
Author

Sounds like the scanner would be useful regardless!

Would it be worthwhile to file a feature request with postgres to optionally store start and end location information in the AST, or even to store comments in the AST? Do such feature requests already exist?

As an aside, I personally suspect it may be possible to build an ~adequate pretty-printer with only start location info in the AST as long as the token stream has comment contents and start/end location, but the comments would definitely end up in the "wrong" place more frequently. Whether or not this is the case isn't material to our conversations here, though.

@lfittl
Copy link

lfittl commented Apr 28, 2021

Would it be worthwhile to file a feature request with postgres to optionally store start and end location information in the AST, or even to store comments in the AST? Do such feature requests already exist?

I think the main question would be - is there a use case in the Postgres server itself, that warrants adding this. And I suspect the trade-off here would make this a difficult conversation (since storing this additional data consumes memory, and there isn't much gain for Postgres itself to have parse tree nodes for comments).

Important context: Even the comments in the scanner is not something that Postgres itself does today, but rather a small patch I added in pg_query (see https://github.com/pganalyze/libpg_query/blob/13-latest/patches/04_lexer_comments_as_tokens.patch)

You could consider teaching the parser to create the AST nodes and we can carry this as a patch in libpg_query - I don't have time to work on that myself at this point, but open to review a PR if someone wants to work on it.

@rattrayalex
Copy link
Author

rattrayalex commented Apr 28, 2021

Ah, got it – thank you! I may not be able to do that right away either so I opened an issue here, I hope that's okay: pganalyze/libpg_query#103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants