-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expose libpg_query scanner to include comments #8
Comments
Comments to my knowledge are not an AST node returned by lib_pgquery, can you provide a link? From my understanding, only comments left in tact would be those in functions, as function bodies are treated as text. |
Also, @benjie has explored this with this underlying tool before via the old version of this parser, and from what I understand, he had to approach comments with another library. If there is new development in lib_pgquery it could open up a new possibility! |
Perhaps I'm mistaken; what I saw was this in the pg_query 2.0 announcement blog post:
But perhaps that doesn't include comment contents, I haven't dug into it. |
Ah, looking at the libpg_query readme again it looks like they have an example showing a token stream, including comments, being emitted. Usually including a list of comments and their locations is enough for a pretty-printer (so long as the ast nodes have location information as well). |
Yes I’m very keen for lib pgquery 2 to be supported so I can try this out. I’m hoping that it also gives better location info for all the nodes; but if not even just using the scanner (which does) would be useful. |
I see it here yes https://pganalyze.com/blog/pg-query-2-0-postgres-query-parser
So if this is the case, I've already integrated the latest 13-latest branch from libpg_query. Maybe @lfittl can shed light on where we can look to see information about comments? |
The important aspect here is that the comments are only part of the scanner output, not the parser output (since there is no parse tree representation of comments in Postgres). In the ruby version, for reference:
|
Thanks for the clarification @lfittl :) Looks like we'd have to implement a connection via @benjie do you think if you had that type of raw scanner information, would it even be helpful? Of does the context of the AST location matter? |
If the parser includes enough location info then we can re-attach the comments from the scanner in the right place; it’s quite common to do this in tools like prettier. Last time I tried there wasn’t sufficient location info from the pg parser (a limitation of Postgres’ own parser, only the location info necessary for errors etc was present); but that was a fair while ago. The optimal situation would be if every AST node contained start and end (or length) information, either character offset or line/column. |
But honestly even if the parser doesn't suit, using the scanner to build a new parser is an option (and at least it's part of the job done!) |
Noticed this here: launchql/libpg-query-node#8 (comment)
For reference to see what it could be like to combine the scanner/parser: Using the same query from scanner example: SELECT 1 --comment Tokens
AST[
{
"RawStmt": {
"stmt": {
"SelectStmt": {
"targetList": [
{
"ResTarget": {
"val": {
"A_Const": {
"val": {
"Integer": {
"ival": 1
}
},
"location": 7
}
},
"location": 7
}
}
],
"limitOption": "LIMIT_OPTION_DEFAULT",
"op": "SETOP_NONE"
}
}
}
}
] I think it could be good to get a few more scanner examples and empirically look at the locations and try to see what the pattern can emerge. With this simple example, it seems that we only get |
Yeah that’s how it was before; the AST has insufficient information. But the scanner definitely opens up a new avenue; I always thought we’d need a custom parser to do what we’re trying to do, but building one atop Postgres’ own scanner would give me much greater confidence. Quite a few of the JS scanners seem to have issues scanning dollar-quoted strings, not to mention some of the more esoteric parts of postgres’ scanner. |
Sounds like the scanner would be useful regardless! Would it be worthwhile to file a feature request with postgres to optionally store start and end location information in the AST, or even to store comments in the AST? Do such feature requests already exist? As an aside, I personally suspect it may be possible to build an ~adequate pretty-printer with only start location info in the AST as long as the token stream has comment contents and start/end location, but the comments would definitely end up in the "wrong" place more frequently. Whether or not this is the case isn't material to our conversations here, though. |
I think the main question would be - is there a use case in the Postgres server itself, that warrants adding this. And I suspect the trade-off here would make this a difficult conversation (since storing this additional data consumes memory, and there isn't much gain for Postgres itself to have parse tree nodes for comments). Important context: Even the comments in the scanner is not something that Postgres itself does today, but rather a small patch I added in pg_query (see https://github.com/pganalyze/libpg_query/blob/13-latest/patches/04_lexer_comments_as_tokens.patch) You could consider teaching the parser to create the AST nodes and we can carry this as a patch in libpg_query - I don't have time to work on that myself at this point, but open to review a PR if someone wants to work on it. |
Ah, got it – thank you! I may not be able to do that right away either so I opened an issue here, I hope that's okay: pganalyze/libpg_query#103 |
Noticed this here: launchql/libpg-query-node#8 (comment)
Hey there,
Neither this nor https://github.com/pyramation/pgsql-parser clarify whether the AST it produces include comments by default, and whether or not there is a way to include SQL comments.
Now that lib_pgquery 2.0 supports comment emission, that would be nice to document (and, if not implemented, implement). This would be useful for implementing deparsers/beautifiers like prettier-plugin-pg.
Thanks!
The text was updated successfully, but these errors were encountered: