-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Treat private-use characters like non-printable characters for escaping #4015
Comments
It seems to be a parser bug instead of a serializer bug: https://sass-lang.com/playground/#eJwzNHQoLU5VUCpOLC62Ki4pysxLV7LmUkm0UogxAAJjQ2suh5TUpNJ0BYikXk5qXnpJhoZKoiZMpLA0vyQVJADUlwTWh1tXErquJJCuZKAu3HqS0fUkg/SkAPWkAl2IS1cKuq4UTWsAkQNLNQ==?s=L1C1-L9C43
The proper behavior here should be that at parse phase |
Likely root cause is at this line: https://github.com/sass/dart-sass/blob/9e6e3bfbd28fa07bd0df63cfcb85d2db9ef9b6c2/lib/src/parse/parser.dart#L472 That there is a special logic that when parsing string as identifier, already escaped ascii number 0-9 ( @nex3 I wonder why this special treatment is done during parsing phase. Maybe because escaped 0-9 indicating that this string must always be an identifier? Shouldn't the special escape for numbers at the beginning of an identifier to be applied at serialization based on whether it's outputting an identifier or not? |
The issue here isn't that the space is being added
The space itself is part of the CSS syntax for escape codes (see § 4.3.7. Consume an escaped code point). We want to include this consistently in the canonicalized format of parsed identifiers so that equivalent identifiers are always equal, while also ensuring that there isn't weird behavior like the identifier form of Edit: Sorry, I'm wrong in that the Sass spec does not mandate a space after |
As far as I know the space is optional, and only required is the next token is a space character or hex character? |
That's right, but because the way it's canonicalized is observable—the SassScript value of the identifier This is a downstream effect of the way we use the "unquoted string" datatype to represent not just identifiers but any CSS value we don't have a dedicated type for, including things like plain-CSS functions and so on. Where quoted strings just store their semantic values, unquoted strings store their syntactic values. This means that identifiers are stored escaped, so we need to be consistent about how they're escaped so we don't have weird issues where, for example, |
It seems to me that there is a limitation that we cannot clearly tell the difference between an unquoted identifier string or an unquoted non-identifier string, and that’s why we are just parsing it without decoding the escape sequence to force it to be outputted as is. My question is that why cannot we always decode the escape sequence during parsing stage (as this example in question is not a Sass value from JS script but a value directly in sass source input). In other words, parse |
Thanks for looking into this. Please note that the space seems to be removed in CSS when concatted with text: https://codepen.io/RoelN/pen/zxORvxV But then again I'd expect the output of $one: \31;
div::before {
content: unquote("\"#{$one}#{$one}#{$one}\"");
} to be
(with our without the trailing space) and not
|
I actually investigated this more on Wednesday and concluded that my proposed solution wasn't quite right, and in fact was based on a misunderstanding of the actual behavior being shown above.
Because we use unquoted strings to represent both identifiers and CSS values we otherwise don't parse. We need to be able to distinctly represent the value If I had the language to design over again, I might split up the "unquoted string" data type into separate "identifier string" and "raw string" types, where "identifier string" works like quoted strings and contains semantic code point values and "raw string" works like unquoted strings do today and is a purely syntactic representation. But I think making such a fundamental change at this point would be a lot more trouble than it would be worth.
That's because the space is not part of the string. It's part of the escape code. See the railroad diagram for an escape code, which includes an optional trailing whitespace character. Note that this |
Thanks for the explanation! Just to be absolutely 100% sure, there is no way to avoid variables being parsed? So See this playground: This $aaa: "\61";
div::before {
content: $aaa;
} will always parse to div::before {
content: "a";
} and we can never create the following from Sass:
My use case, if it helps: I want to produce CSS where all Unicode values are readable, so developers can see the actual hex value, like |
@RoelN tl;dr: sass/dart-sass#568 is probably the best solution to your issue. Sass stores (quoted) string values as their semantic contents, so from Sass's perspective there's literally no difference between the value The ideal way to solve this is to add a flag that tells Sass to emit non-ASCII code points as escapes rather than literal characters, which is covered by sass/dart-sass#568. If you don't want to wait for (or contribute to) that issue, the next best way is probably to have a post-processing step that makes this change outside of Sass. |
@nex3 I appreciate the thorough responses! Thanks for helping me understand what's going on. I'll poll with the team on how we want to deal with this, but it looks like Sass' default approach is solid and we'll probably stick to that. |
Example in playground
The following code:
results in the following CSS:
Please see the space added after
\31
. This is being added when using the Unicode value for the number1
, but not for the Unicode value for characters in the PUA. The latter is correct, there shouldn't be a trailing space.The text was updated successfully, but these errors were encountered: