Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: parsing operators and newlines #134

Open
nikomatsakis opened this issue Feb 18, 2022 · 12 comments
Open

Question: parsing operators and newlines #134

nikomatsakis opened this issue Feb 18, 2022 · 12 comments
Labels
question Further information is requested

Comments

@nikomatsakis
Copy link
Member

How to think about binary operators and newlines? Rust had the same issue to wrestle with and I suspect we want the same general answer. I'm referring to things like this:

fn foo() -> {
    if true { 1 } else { 2 }
    -5 # probably wants to return `-5`, not `-4`
}

fn foo() -> {
    a = if true { 1 } else { 2 }
    -5 # probably wants to return `-5` and set `a` to 1
}

fn foo() -> {
    a = (if true { 1 } else { 2 }
    -5) # probably wants to set `a` to `-4` and return `()`? Not sure.
}

fn foo() -> {
    a = if false { 1 } else { 2
    -5} # probably wants to set `a` to `-3`, but I'm not entirely sure ,especially since the next example...
}

async fn foo() -> {
    a = if false { 1 } else { print(2).await
    -5} # ...probably wants to print the number `2` and then set `a` to `-5`, and not try to subtract `5` from `()`
}

The rule I propose:

Binary operators cannot be preceded by a newline

So that you have to write b - \n 5 and not b \n - 5. That'd be a very simple rule.

Other rules I can imagine:

  • Statement-like expressions (e.g., if), when followed by a newline, do not accept binary operators.

But I'd rather not have to reason like that, it makes the grammar really complex.

Originally posted by @nikomatsakis in #129 (comment)

@nikomatsakis
Copy link
Member Author

cc @xffxff

@nikomatsakis nikomatsakis added the question Further information is requested label Feb 18, 2022
@nikomatsakis
Copy link
Member Author

nikomatsakis commented Feb 18, 2022

Note that the rule i proposed would make this code:

fn foo() -> {
    a = (if true { 1 } else { 2 }
    -5) # probably wants to set `a` to `-4` and return `()`? Not sure.
}

set a to -5, and discard the if result.

Ah, I just remembered that I think I generally permitted newlines inside of vectors and things without a comma (I should write some tests for that...), so this would fit with that. e.g. this is legal dada right now (playground)

fn subtract(a, b) {
    a - b
}

fn main() {
    print(subtract(
        5
        3
    )).await #! OUTPUT 2
}

and hence:

fn subtract(a, b) {
    a - b
}

fn main() {
    print(subtract(
        5
        - 3
    )).await #! OUTPUT 8
}

@nikomatsakis
Copy link
Member Author

My thinking was that we can just await the whole "trailing ," question altogether and use newlines. Not sure if that was a good idea. =)

@brson
Copy link
Contributor

brson commented Feb 21, 2022

Given

Binary operators cannot be preceded by a newline

then

fn foo() -> {
    a = (if true { 1 } else { 2 }
    -5) # probably wants to set `a` to `-4` and return `()`? Not sure.
}

doesn't seem like it would parse, unless (Expr Expr) parses - is it going to? That would make blocks and parens, (Expr Expr) and {Expr Expr} .... the same?

Having the grammar be newline-sensitive sure doesn't appeal to me much - I didn't realize Rust did this. (edit: but now that I think about it this is probably the special rule about parsing control structures I always knew rust had but couldn't remember the details of).

This problem seems similar to the disambiguation of tuples and function calls in #117, and could be solved the same way, where an opening paren in a function call can't be split onto a new line.

@brson
Copy link
Contributor

brson commented Feb 21, 2022

fn foo() -> {
    a = (if true { 1 } else { 2 }
    -5) # probably wants to set `a` to `-4` and return `()`? Not sure.
}

Cases like this sure do look confusing.

The rules could be different inside { } vs inside ( ) or the compiler could lint against it inside ( ) in a way that would persuade people never to write such code.

@brson
Copy link
Contributor

brson commented Feb 21, 2022

Another seeming solution to the binops case in particular would be to require binops to always be space-delimited, and unary ops not: 1 - 2 vs -2.

@brson
Copy link
Contributor

brson commented Feb 21, 2022

Another newline-sensitive solution that might handle multiple cases in this issue is that for every sequence of expressions both newlines and commas act as separators, with the separators having precedence over continuing to parse the current expression.

@nikomatsakis
Copy link
Member Author

@brson

Rust doesn't make the grammar newline sensitive, but it distinguishes uses of things like if ... { } else { } in "statement position" from elsewhere. It further requires that a "statement-like" if (etc) has () type. That's why this program doesn't type check.

doesn't seem like it would parse, unless (Expr Expr) parses - is it going to? That would make blocks and parens, (Expr Expr) and {Expr Expr} .... the same?

Good point, I think that I meant to have () behave differently with respect to newlines than other things.

@nikomatsakis
Copy link
Member Author

I'll have to ponder the other suggestions. I also don't love whitespace or newline-sensitive grammars, but I think it's worth trying to not have ;. It leads to some interesting places. I would like to have the grammar be 'minimally' whitespace sensitive -- I think rules like 'cannot be separated by whitespace" (e.g., - 5 and -5 are not the same) or "cannot have a newline" are ok. I would not want more than that because I love the ability to have a "autoformat on save" just cleanup a bunch of gook I just wrote and having things line up correctly. When using Python a lot, I also found that it was easy for me to lose indentation when copy-pasting or at other times, and that could be quite confusing to debug.

@brson
Copy link
Contributor

brson commented Feb 28, 2022

I think we might as well implement the rule you suggest, at least for now. I'd love to get the reference grammar and production grammar in agreement so they can be kept in sync forever. Parol has some ability to turn on and off newline sensitivity based on context, so I think it should be able to handle the rule.

Just one more thing to point out: it's been a long time since I read Code Complete but one bit that has stuck with me is the suggestion that splitting binops to a new line before the op reads better than splitting after the op. That is this:

let x = foo
    + bar
    - baz
    / qux

is easier to scan than

let x = foo +
    bar -
    baz /
    qux

and the proposed rule makes that formatting not possible.

@nikomatsakis
Copy link
Member Author

Big +1 to getting ref / actual grammar in sync.

I did consider that the rule would mean you can't move operators to the start of the line. I thought it wasn't as popular for some reason, checking rustfmt suggestions it at least does move operators to the beginning (example).

One other consideration: requiring that binary operators be separated by whitespace would resolve the foo<T> vs foo < T ambiguity as well, right?

@nikomatsakis
Copy link
Member Author

Another thought that I had:

Maybe if true { ... } else { ... } and friends should just always require parentheses if you plan to apply an operator to them? I feel like it's kind of hard to read anyway. Some examples:

fn foo() -> {
    if true { 1 } else { 2 } - 5
    (if true { 1 } else { 2 }) - 5
}

fn foo() -> {
    if true { 1 } else { 2 }.share
    (if true { 1 } else { 2 }).share
}

Not sure, the parens don't look great. Going to leave this comment for posterity's sake at least though. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants