Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-35599: Replace (u)llstr() with %llu/d #3724

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

MDEV-35599: Replace (u)llstr() with %llu/d #3724

wants to merge 10 commits into from

Conversation

ParadoxV5
Copy link
Contributor

@ParadoxV5 ParadoxV5 commented Dec 30, 2024

  • The Jira issue number for this PR is: MDEV-35599

Description

llstr() is a workaround for printing longlongs from the times when %llu was not supported
@vuvova, #3360 (comment)

What problem is the patch trying to solve?

Nearly all of (u)llstr() calls were preparations for calls to the printf extended family; but both the C/C++ printf family and our my_snprintf() & descendants already support long longs through the ll size specifier.
Inlining llstr() saves an additional function call each everywhere, which

  • reduces cognitive load
  • potentially improves performance
  • mitigates buffer overruns from if1 the long long is too large for the buffer given to llstr()

Original Question: Which llstr(value, buff) can we replace with sprintf(buff, "%lld", value)?

Answer: all but about a dozen of a few hundred of them!
I approached this task by deleting (u)llstr() functions entirely, which leaves these dozen-ish ones to call longlong10_to_str(value,buff,-10); directly.
As this longlong10_to_str() will be incompatible with MDEV-26896’s -Wconversion, that task (whenever it begins) will most likely revive (u)llstr() to split longlong10_to_str() as.

Real motivation

I wanted to modernize our ancient code.2
In addition, I visited lots of files across components during project-wide cleanups like this and #3360, which helps me familiarize with the repository (and its state of maintenance) as a new recruit of the Corp.

Non-goals

  • Check the correctness of the types with or without this rewrite
    • Particularly, I feel that many llstr() calls (replaced with "%lld") should rather use ullstr() (which would be "%llu")
      • There’s only a few ullstr calls currently.
      • Turning -Wformat-signedness on will warn3 these mistakes.
      • "%"PRId64, "%"PRIu64 and %zu are likely more accurate.
    • Many also explicitly casted their values to longlong
    • I’ve only corrected cases that emitted -Wformat4 complaints, such as sql/sql_test.cc
      • Some latter files used llstr to convert ints and longs to strings… ¯\_(ツ)_/¯
  • Extensive reformat our ancient files
    • You can tell a file’s gone through a history from its mix of tab-space and space-only indents.
      • Speaking of indents – I used a mix of preferences in my diffs… I should chat with others on our preferences vs. practicality.

Release Notes

TODO: Ask QA engineers for their impressions on this housekeeping

How can this PR be tested?

  • Manual: Check the deletion of each llstr() usage that it’s either
    • inlined into its outer printf by replacing %s with %lld
    • or, seldomly (for the aforementioned dozen-ish), entirely replaced with snprintf(…, "%lld", …) longlong10_to_str(…, …, -10)
  • Automatic:
    1. Optional: Create to a temporary branch to merge [MDEV-21978 part 2] Enable and Fix GCC -Wformat Checks on my_vsnprintf and Users #3360 with this commit4
    2. Build the server and verify there’re no compile warnings3
    3. Run the tests to verify that no output or behavior were affected

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

Footnotes

  1. I don’t think we have one right now… 😶‍🌫️

  2. See also: https://justforfunnoreally.dev

  3. We also have -Werror active. 2

  4. -Wformat currently only supports the C/C++ printf family; [MDEV-21978 part 2] Enable and Fix GCC -Wformat Checks on my_vsnprintf and Users #3360 of MDEV-21978 will add support for my_vsnprintf and descendants. 2

> `llstr()` is a workaround for printing longlongs
> from the times when `%llu` was not supported

Nearly all of (`u`)`llstr()` calls were preparations
for calls to the `printf` extended family;
but both the C/C++ `printf` family and our `my_vsnprintf()` &
descendants already support `long long`s via the `ll` size specifier.

Inlining `llstr()` saves an additional function call each *everywhere*,
which reduces cognitive load and potentially improves performance too.
Copy link
Member

@vuvova vuvova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! these *llstr() annoyed me for quite a while. thanks!

 35 files changed, 656 insertions(+), 1030 deletions(-)

And -400 lines, no bad.

See comments inline.

client/mysqladmin.cc Outdated Show resolved Hide resolved
@@ -821,8 +820,7 @@ static bool print_row_event(PRINT_EVENT_INFO *print_event_info, Log_event *ev,
if (is_stmt_end && !result)
{
if (print_event_info->print_row_count)
fprintf(result_file, "# Number of rows: %s\n",
llstr(print_event_info->row_events, ll_buff));
fprintf(result_file, "# Number of rows: %lld\n", print_event_info->row_events);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

row_events is ulonglong, that is unsigned. Were you getting a compiler error here?

you wrote it's a non-goal, so I will no longer comment on signed/unsigned mismatch, but would the code compile without you fixing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it’s impractical to track typedefs manually when we can automatically catch them by turning on -Wformat-signedness, so I explicitly excluded this step.

From what I can tell, neither -Wformat nor -Wall include -Wformat-signedness, so issues like this continue not spawning compile warnings.

I mentioned -Wformat-signedness as a comment of MDEV-26896.
We can prioritize it here or we can push for MDEV-26896 to ship with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Turning on -Wconversion (MDEV-26896) will complain about cases like these before this replacement.
🧐 It will also complain about char *ullstr(longlong value,char *buff) because it delegates a ulonglong arg to longlong10_to_str(value,buff,10); which is sign-agnostic.

sql/item_func.cc Outdated Show resolved Hide resolved
sql/slave.cc Outdated Show resolved Hide resolved
sql/slave.cc Outdated Show resolved Hide resolved
storage/maria/ma_recovery_util.c Outdated Show resolved Hide resolved
@@ -335,21 +332,19 @@ int chk_size(HA_CHECK *param, register MI_INFO *info)
{
error=1;
mi_check_print_error(param,
"Size of indexfile is: %-8s Should be: %s",
llstr(size,buff), llstr(skr,buff2));
"Size of indexfile is: %-8lld Should be: %lld", size, skr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%-8lld -> %lld ?

same below and in Aria

llstr(info->s->base.max_key_file_length-1,buff));
mi_check_print_warning(param,
"Keyfile is almost full, %10lld of %10lld used",
info->state->key_file_length, info->s->base.max_key_file_length-1);
Copy link
Member

@vuvova vuvova Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll stop commenting on the width in %lld. I think it make sense to specify the width in

  • status output printfs, that print many rows and we want numbers to be aligned
  • a warning/error that can be reported many times (e.g. some link is broken) and we want all numbers there to be aligned

if it's a warning/error that is reported only once per file like "file almost full", something in the header is wrong, etc — it doesn't need to align its numbers as there's nothing to align them with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also did not question the purpose of minimum-length specifiers inherited from the original code.
I’ll think about it.

"Can't find key for index: %2d",
llstr(start_recpos,llbuff),key+1);
start_recpos, key+1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one of the errors, I suppose, that can be reported many times for different records. So it kind of makes sense to align the offset here

storage/myisam/myisamchk.c Outdated Show resolved Hide resolved
Copy link
Contributor Author

@ParadoxV5 ParadoxV5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven’t checked CI and the feedback’s already in.
Thank you, boss!

@@ -821,8 +820,7 @@ static bool print_row_event(PRINT_EVENT_INFO *print_event_info, Log_event *ev,
if (is_stmt_end && !result)
{
if (print_event_info->print_row_count)
fprintf(result_file, "# Number of rows: %s\n",
llstr(print_event_info->row_events, ll_buff));
fprintf(result_file, "# Number of rows: %lld\n", print_event_info->row_events);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it’s impractical to track typedefs manually when we can automatically catch them by turning on -Wformat-signedness, so I explicitly excluded this step.

From what I can tell, neither -Wformat nor -Wall include -Wformat-signedness, so issues like this continue not spawning compile warnings.

I mentioned -Wformat-signedness as a comment of MDEV-26896.
We can prioritize it here or we can push for MDEV-26896 to ship with it.

sql/item_func.cc Outdated Show resolved Hide resolved
storage/maria/aria_chk.c Outdated Show resolved Hide resolved
storage/maria/aria_chk.c Outdated Show resolved Hide resolved
storage/maria/ma_recovery_util.c Outdated Show resolved Hide resolved
llstr(info->s->base.max_key_file_length-1,buff));
mi_check_print_warning(param,
"Keyfile is almost full, %10lld of %10lld used",
info->state->key_file_length, info->s->base.max_key_file_length-1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also did not question the purpose of minimum-length specifiers inherited from the original code.
I’ll think about it.

sql/sql_prepare.cc Outdated Show resolved Hide resolved
@@ -1538,8 +1537,7 @@ static void test_prepare()
fprintf(stdout, "\n\t tiny : %d (%lu)", tiny_data, length[0]);
fprintf(stdout, "\n\t short : %d (%lu)", small_data, length[3]);
fprintf(stdout, "\n\t int : %d (%lu)", int_data, length[2]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size specifier for char is hh.
But since varargs smaller than int get auto-promoted to int, I won’t judge it here (or press for h support in my_vsnprintf).

This “long” type applied on an int data itches me, though –
They probably meant Maria-SQL INT, which is better defined with int32_t (and format with "%"PRId32) to strictly match INT4.

storage/maria/ma_check.c Show resolved Hide resolved
Comment on lines +2012 to +2013
if (my_b_printf(file, "SET @@session.sql_mode=%llu%s\n",
sql_mode, print_event_info->delimiter))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my_b_printf does not support and ignores %llu, causing segfaults here and likely erroneous outputs in other places.

my_b_printf is another one of our printf alternative that formats to my_b_write.
It only supports format specifiers %s, %c, %d/%u and %ld/%lu (yes, they’re specifiers in its lexicon) and ignores others.


Notably, my_b_printf also supports %b and the ` flag for %s similar to my_snprintf; but that’s switching to %sB and %sQ, whereas my_b_printf is a separate implementation independent of that.

When I migrated my_snprintf users, I recognized that my_b_printf isn’t part of the family and skipped over it.
That meant my_b_printf would remain incompatible with -Wformat and be “out of date” with the modified my_snprintf.

Except my_b_printf is causing trouble right now.
@vuvova, what shall we do about it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion — rewrite my_b_vprintf copying the logic from my_vfprintf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and they will be consistent until the next time we update the my_snprintf family.

🧐 I’m not sure about that. my_vfprintf works by resizing its allocation until the result fits1, whereas my_b_write writes to a limited buffer that only flushes when full.


Either case, I should mark this PR as on hold.

Footnotes

  1. … which is not efficient; unfortunately, unlike C/C++ snprintf, my_snprintf doesn’t tell the number of bytes it would’ve written if the output buffer’s sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I mean, my_b_vprintf could write to a buffer, resizing it like my_vfprintf does. Agree, it's not the best approach, but fixing it would be a separate task. At the end my_b_vprintf would use my_b_write to write the buffer to IO_CACHE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have another idea.

We can modify the implementation of my_vsnprintf to instead build around a writer callback, say bool (*writer)(void *reference, char *string, int string_length) that returns whether the loop should halt.
When respective buffers fill up, regular my_vsnprintf’s callback would return a halt request, while both my_vfprintf’s and my_b_vprintf’s would flush and carry on.

Of course, because my_vsnprintf is divided to many subfunction components, this approach is intrusive and very complex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. That's why it wasn't implemented yet. In particular, it wasn't implemented, when we needed my_vfprintf.

A compromise approach — simpler than yours, more complex than what I suggested earlier — would be to make my_vsnprint_ex to return, as you wrote, number of bytes it would have written if the buffer were big enough. In this approach my_vsnprint can still return what it does now, it's a public API function and if we want to be conservative, we won't change it. But the internal my_vsnprint_ex can be changed to return whatever we want, and my_vfprintf can call it directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌 Let’s put then on the backburner for now; I have higher-priority assignments.

@ParadoxV5 ParadoxV5 marked this pull request as draft January 2, 2025 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants