Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special-case ## pasting to string/character constants (issue #168) #255

Merged
merged 2 commits into from
Mar 18, 2022

Conversation

patrickdowling
Copy link
Contributor

Building on the approach from PR #169 this special cases the ## pasting in expandHashHash for string and character literals to allow for (user-defined) literals (operator “”).

  • The string/char case is in it’s own if block; it’s not necessarily elegant but trying to maintain a single tokensB/strAB path through the other branches seems harder to follow (*).
  • There’s a dangling TODO because while ## ## or varargs combos appear to work, it’d be great if someone could review that assumption.
  • Added some additional hashhash test cases. Suggestions for other/better boundary cases welcome.
  • In at least two tests (hashhash18, hashhash19) the result of ## isn’t actually useful (and does eventually produce a syntax error in cppcheck). Detecting that seems like a different rabbit hole.

(*) although now it’s not a trivial merge into cppcheck tree. Can’t win…

(**) As an aside there already exists a difference between the handling of operator “” applied to strings/chars or numbers even without macros. But at least with/without macros its now consistent, so this is perhaps a follow-up item.

#define STR(a) a ## _str
#define ULL(a) a ## _ull
#define CC(a) a ## _cc

auto s1 = "ABC"_str;
auto s2 = STR("ABC");

auto i1 = 12345_ull;
auto i2 = ULL(12345);

auto c1 = 'A'_cc;
auto c2 = CC('A');
##AST
[literals.cc:5]
=
|-s1
`-(
  |-operator""_str
  `-,
    |-"ABC" 'const char *'
    `-3 'signed int'

[literals.cc:6]
=
|-s2
`-(
  |-operator""_str
  `-,
    |-"ABC" 'const char *'
    `-3 'signed int'

[literals.cc:8]
=
|-i1
`-12345_ull

[literals.cc:9]
=
|-i2
`-12345_ull

[literals.cc:11]
= 'char'
|-c1 'char'
`-'A' 'char'

[literals.cc:12]
= 'char'
|-c2 'char'
`-'A' 'char'

This enables use of macros to add literals/operator "".
simplecpp.cpp Outdated

static bool isCharConstant(const std::string &s)
{
return s.size() == 3 && (s[0]=='\'') && (s[2]=='\'');
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho this should also be classified as a char constant: '\t'.
a multi char constant is not handled: 'ab'. is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point (and now the comment about ''' in the original thread makes way more sense).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multichar constants don't seem to work with operator "" though, so that might be accidentally intentional. Will investigate further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the pragmatic approach seems to be to relax the restriction to s.size() > 2. That covers ‘a’, ‘1234’, ‘\t’, ‘\000’ etc. but doesn't check the validity, which seem in line with the regular output. Or even > 1 analog to strings since there’s the weird case of ’’.

Multi-character plus user-defined operator "" don't seem to actually compile but clang/gcc will happily preprocess it with -E (godbolt).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about '' , yes that is weird :-)

So yes I think the pragmatic approach sounds good.

simplecpp.cpp Show resolved Hide resolved
run-tests.py Show resolved Hide resolved
@patrickdowling
Copy link
Contributor Author

So, some additional cases to consider down the literals rabbit hole...

#define CH(x) x##_ch

auto s1 = u8"ABC";
auto s2 = CH(u8"ABC");

auto c1 = u8'A';
// auto c2 = CH(u8'A'); => fails

results in

auto s1 = u8"ABC" ;
auto s2 = u8"ABC"_ch ;

auto c1 = u8'A' ;

Interestingly the string with u8 prefix "works" (although because it doesn’t start with “ it’s not special cased) but for chars it doesn’t. This seems to be because for u8"ABC" the A->name test in expandHashHash is true, but false for u8'A'. cppcheck then also doesn't catch the operator "" (because it's a single token?)

[prefix.cc:4]
=
|-s2
`-u8"ABC"_ch

There already exists a isStringLiteral function which could be reused to check prefixes in isStringLiteral and isCharLiteral but it does make me wonder if this is chasing the wrong thing...

@danmar danmar merged commit e7d85ea into danmar:master Mar 18, 2022
@danmar
Copy link
Owner

danmar commented Mar 18, 2022

yay!! I think some users will be very happy!

@danmar
Copy link
Owner

danmar commented Mar 18, 2022

So, some additional cases to consider down the literals rabbit hole

I assumed this could be investigated in separate PRs. If you want to continue working on that feel free to do it. Otherwise I hope we can open some tickets. The problems you see.. is it simplecpp bugs or cppcheck bugs?

@patrickdowling
Copy link
Contributor Author

Great, thanks for the review!
For those additional u8 cases I haven't actually seen a problem, just after reading up literals I wondered what might happen 😬 . So yeah, that's maybe a separate task. The original ## issues that led me here were actually in cppcheck because it was bailing in a project with invalidHashHash before doing anything interesting. That now seems ok - but on that codebase I then have other issues that I haven't investigated deeper yet (but for which I made a trac ticket).

@patrickdowling patrickdowling deleted the issue_168_token_pasting branch March 19, 2022 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants