MDEV-35599: Replace (`u`)`llstr()` with `%llu`/`d` #3724

ParadoxV5 · 2024-12-30T05:14:37Z

The Jira issue number for this PR is: MDEV-35599

Description

llstr() is a workaround for printing longlongs from the times when %llu was not supported
⸺ @vuvova, #3360 (comment)

What problem is the patch trying to solve?

Nearly all of (u)llstr() calls were preparations for calls to the printf extended family; but both the C/C++ printf family and our my_snprintf() & descendants already support long longs through the ll size specifier.
Inlining llstr() saves an additional function call each everywhere, which

reduces cognitive load
potentially improves performance
mitigates buffer overruns from if¹ the long long is too large for the buffer given to llstr()

Original Question: Which `llstr(value, buff)` can we replace with `sprintf(buff, "%lld", value)`?

Answer: all but about a dozen of a few hundred of them!
I approached this task by deleting (u)llstr() functions entirely, which leaves these dozen-ish ones to call longlong10_to_str(value,buff,-10); directly.
As this longlong10_to_str() will be incompatible with MDEV-26896’s -Wconversion, that task (whenever it begins) will most likely revive (u)llstr() to split longlong10_to_str() as.

Real motivation

I wanted to modernize our ancient code.²
In addition, I visited lots of files across components during project-wide cleanups like this and #3360, which helps me familiarize with the repository (and its state of maintenance) as a new recruit of the Corp.

Non-goals

Check the correctness of the types with or without this rewrite
- Particularly, I feel that many llstr() calls (replaced with "%lld") should rather use ullstr() (which would be "%llu")
  - There’s only a few ullstr calls currently.
  - Turning -Wformat-signedness on will warn³ these mistakes.
  - "%"PRId64, "%"PRIu64 and %zu are likely more accurate.
- Many also explicitly casted their values to longlong
- I’ve only corrected cases that emitted -Wformat⁴ complaints, such as sql/sql_test.cc
  - Some latter files used llstr to convert ints and longs to strings… ¯\_(ツ)_/¯
Extensive reformat our ancient files
- You can tell a file’s gone through a history from its mix of tab-space and space-only indents.
  - Speaking of indents – I used a mix of preferences in my diffs… I should chat with others on our preferences vs. practicality.

Release Notes

TODO: Ask QA engineers for their impressions on this housekeeping

How can this PR be tested?

Manual: Check the deletion of each llstr() usage that it’s either
- inlined into its outer printf by replacing %s with %lld
- or, seldomly (for the aforementioned dozen-ish), entirely replaced with ~~snprintf(…, "%lld", …)~~ longlong10_to_str(…, …, -10)
Automatic:
1. Optional: Create to a temporary branch to merge [MDEV-21978 part 2] Enable and Fix GCC -Wformat Checks on my_vsnprintf and Users #3360 with this commit⁴
2. Build the server and verify there’re no compile warnings³
3. Run the tests to verify that no output or behavior were affected

Basing the PR against the correct MariaDB version

This is a new feature or a refactoring, and the PR is based against the main branch.
This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

I don’t think we have one right now… 😶‍🌫️ ↩
See also: https://justforfunnoreally.dev ↩
We also have -Werror active. ↩ ↩²
-Wformat currently only supports the C/C++ printf family; [MDEV-21978 part 2] Enable and Fix GCC -Wformat Checks on my_vsnprintf and Users #3360 of MDEV-21978 will add support for my_vsnprintf and descendants. ↩ ↩²

> `llstr()` is a workaround for printing longlongs > from the times when `%llu` was not supported Nearly all of (`u`)`llstr()` calls were preparations for calls to the `printf` extended family; but both the C/C++ `printf` family and our `my_vsnprintf()` & descendants already support `long long`s via the `ll` size specifier. Inlining `llstr()` saves an additional function call each *everywhere*, which reduces cognitive load and potentially improves performance too.

vuvova

Looks great! these *llstr() annoyed me for quite a while. thanks!

 35 files changed, 656 insertions(+), 1030 deletions(-)

And -400 lines, no bad.

See comments inline.

client/mysqladmin.cc

vuvova · 2024-12-30T10:40:32Z

client/mysqlbinlog.cc

@@ -821,8 +820,7 @@ static bool print_row_event(PRINT_EVENT_INFO *print_event_info, Log_event *ev,
  if (is_stmt_end && !result)
  {
    if (print_event_info->print_row_count)
-      fprintf(result_file, "# Number of rows: %s\n",
-              llstr(print_event_info->row_events, ll_buff));
+      fprintf(result_file, "# Number of rows: %lld\n", print_event_info->row_events);


row_events is ulonglong, that is unsigned. Were you getting a compiler error here?

you wrote it's a non-goal, so I will no longer comment on signed/unsigned mismatch, but would the code compile without you fixing it?

Yes, it’s impractical to track typedefs manually when we can automatically catch them by turning on -Wformat-signedness, so I explicitly excluded this step.

From what I can tell, neither -Wformat nor -Wall include -Wformat-signedness, so issues like this continue not spawning compile warnings.

I mentioned -Wformat-signedness as a comment of MDEV-26896.
We can prioritize it here or we can push for MDEV-26896 to ship with it.

🤔 Turning on -Wconversion (MDEV-26896) will complain about cases like these before this replacement.
🧐 It will also complain about char *ullstr(longlong value,char *buff) because it delegates a ulonglong arg to longlong10_to_str(value,buff,10); which is sign-agnostic.

sql/item_func.cc

sql/slave.cc

storage/maria/ma_recovery_util.c

vuvova · 2024-12-30T13:11:05Z

storage/myisam/mi_check.c

@@ -335,21 +332,19 @@ int chk_size(HA_CHECK *param, register MI_INFO *info)
    {
      error=1;
      mi_check_print_error(param,
-			   "Size of indexfile is: %-8s        Should be: %s",
-			   llstr(size,buff), llstr(skr,buff2));
+        "Size of indexfile is: %-8lld        Should be: %lld", size, skr);


%-8lld -> %lld ?

same below and in Aria

vuvova · 2024-12-30T13:14:52Z

storage/myisam/mi_check.c

-			   llstr(info->s->base.max_key_file_length-1,buff));
+    mi_check_print_warning(param,
+      "Keyfile is almost full, %10lld of %10lld used",
+      info->state->key_file_length, info->s->base.max_key_file_length-1);


I'll stop commenting on the width in %lld. I think it make sense to specify the width in

status output printfs, that print many rows and we want numbers to be aligned

a warning/error that can be reported many times (e.g. some link is broken) and we want all numbers there to be aligned

if it's a warning/error that is reported only once per file like "file almost full", something in the header is wrong, etc — it doesn't need to align its numbers as there's nothing to align them with

I also did not question the purpose of minimum-length specifiers inherited from the original code.
I’ll think about it.

vuvova · 2024-12-30T13:33:42Z

storage/myisam/mi_check.c

                                     "Can't find key for index: %2d",
-                                     llstr(start_recpos,llbuff),key+1);
+                                     start_recpos, key+1);


this is one of the errors, I suppose, that can be reported many times for different records. So it kind of makes sense to align the offset here

storage/myisam/myisamchk.c

ParadoxV5

I haven’t checked CI and the feedback’s already in.
Thank you, boss!

ParadoxV5 · 2024-12-30T19:18:56Z

client/mysqlbinlog.cc

@@ -821,8 +820,7 @@ static bool print_row_event(PRINT_EVENT_INFO *print_event_info, Log_event *ev,
  if (is_stmt_end && !result)
  {
    if (print_event_info->print_row_count)
-      fprintf(result_file, "# Number of rows: %s\n",
-              llstr(print_event_info->row_events, ll_buff));
+      fprintf(result_file, "# Number of rows: %lld\n", print_event_info->row_events);


Yes, it’s impractical to track typedefs manually when we can automatically catch them by turning on -Wformat-signedness, so I explicitly excluded this step.

From what I can tell, neither -Wformat nor -Wall include -Wformat-signedness, so issues like this continue not spawning compile warnings.

I mentioned -Wformat-signedness as a comment of MDEV-26896.
We can prioritize it here or we can push for MDEV-26896 to ship with it.

sql/item_func.cc

storage/maria/aria_chk.c

storage/maria/ma_recovery_util.c

ParadoxV5 · 2024-12-30T19:52:38Z

storage/myisam/mi_check.c

-			   llstr(info->s->base.max_key_file_length-1,buff));
+    mi_check_print_warning(param,
+      "Keyfile is almost full, %10lld of %10lld used",
+      info->state->key_file_length, info->s->base.max_key_file_length-1);


I also did not question the purpose of minimum-length specifiers inherited from the original code.
I’ll think about it.

sql/sql_prepare.cc

ParadoxV5 · 2024-12-30T21:44:16Z

tests/mysql_client_test.c

@@ -1538,8 +1537,7 @@ static void test_prepare()
      fprintf(stdout, "\n\t tiny   : %d (%lu)", tiny_data, length[0]);
      fprintf(stdout, "\n\t short  : %d (%lu)", small_data, length[3]);
      fprintf(stdout, "\n\t int    : %d (%lu)", int_data, length[2]);


The size specifier for char is hh.
But since varargs smaller than int get auto-promoted to int, I won’t judge it here (or press for h support in my_vsnprintf).

This “long” type applied on an int data itches me, though –
They probably meant Maria-SQL INT, which is better defined with int32_t (and format with "%"PRId32) to strictly match INT4.

storage/maria/ma_check.c

Reviewed-by: Sergei Golubchik <[email protected]>

This one doesn't know the buffer size! So that's how I missed it in the previous commit.

`timeout` is a `double`, so use `double` operations

ParadoxV5 · 2025-01-02T21:39:20Z

sql/log_event_client.cc

+    if (my_b_printf(file, "SET @@session.sql_mode=%llu%s\n",
+                    sql_mode, print_event_info->delimiter))


my_b_printf does not support and ignores %llu, causing segfaults here and likely erroneous outputs in other places.

my_b_printf is another one of our printf alternative that formats to my_b_write.
It only supports format specifiers %s, %c, %d/%u and %ld/%lu (yes, they’re specifiers in its lexicon) and ignores others.

Notably, my_b_printf also supports %b and the ` flag for %s similar to my_snprintf; but that’s switching to %sB and %sQ, whereas my_b_printf is a separate implementation independent of that.

When I migrated my_snprintf users, I recognized that my_b_printf isn’t part of the family and skipped over it.
That meant my_b_printf would remain incompatible with -Wformat and be “out of date” with the modified my_snprintf.

Except my_b_printf is causing trouble right now.
@vuvova, what shall we do about it?

suggestion — rewrite my_b_vprintf copying the logic from my_vfprintf

~~and they will be consistent until the next time we update the my_snprintf family.~~

🧐 I’m not sure about that. my_vfprintf works by resizing its allocation until the result fits¹, whereas my_b_write writes to a limited buffer that only flushes when full.

Either case, I should mark this PR as on hold.

Footnotes

… which is not efficient; unfortunately, unlike C/C++ snprintf, my_snprintf doesn’t tell the number of bytes it would’ve written if the output buffer’s sufficient. ↩

Yes. I mean, my_b_vprintf could write to a buffer, resizing it like my_vfprintf does. Agree, it's not the best approach, but fixing it would be a separate task. At the end my_b_vprintf would use my_b_write to write the buffer to IO_CACHE

I have another idea.

We can modify the implementation of my_vsnprintf to instead build around a writer callback, say bool (*writer)(void *reference, char *string, int string_length) that returns whether the loop should halt.
When respective buffers fill up, regular my_vsnprintf’s callback would return a halt request, while both my_vfprintf’s and my_b_vprintf’s would flush and carry on.

Of course, because my_vsnprintf is divided to many subfunction components, this approach is intrusive and very complex.

Sure. That's why it wasn't implemented yet. In particular, it wasn't implemented, when we needed my_vfprintf.

A compromise approach — simpler than yours, more complex than what I suggested earlier — would be to make my_vsnprint_ex to return, as you wrote, number of bytes it would have written if the buffer were big enough. In this approach my_vsnprint can still return what it does now, it's a public API function and if we want to be conservative, we won't change it. But the internal my_vsnprint_ex can be changed to return whatever we want, and my_vfprintf can call it directly.

👌 Let’s put then on the backburner for now; I have higher-priority assignments.

ParadoxV5 added the MariaDB Corporation label Dec 30, 2024

ParadoxV5 requested a review from vuvova December 30, 2024 05:14

vuvova requested changes Dec 30, 2024

View reviewed changes

ParadoxV5 commented Dec 30, 2024

View reviewed changes

ParadoxV5 added 9 commits December 30, 2024 15:35

squash: general feedback

c6e6ef7

Reviewed-by: Sergei Golubchik <[email protected]>

fixup: use longlong10_to_str over snprintf

8ab1eaf

fixup: the size of mallinfo fields isn't consistent between platforms

0af0489

fixup: use longlong10_to_str over sprintf

8e5ae27

This one doesn't know the buffer size! So that's how I missed it in the previous commit.

fixup: (oops)

482aa63

sql/item_func.cc: don’t cast timeout to longlong

7dacefe

`timeout` is a `double`, so use `double` operations

fixup: revert .0 trick & redo: HA_OFFSET_ERROR isn't 0

f0bf893

fixup: fix timeout test

d7623b3

fixup: I kept putting buff first in longlong10_to_str

2aa992e

ParadoxV5 commented Jan 2, 2025

View reviewed changes

ParadoxV5 marked this pull request as draft January 2, 2025 22:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDEV-35599: Replace (`u`)`llstr()` with `%llu`/`d` #3724

MDEV-35599: Replace (`u`)`llstr()` with `%llu`/`d` #3724

ParadoxV5 commented Dec 30, 2024 •

edited

Loading

vuvova left a comment

vuvova Dec 30, 2024

ParadoxV5 Dec 30, 2024

ParadoxV5 Dec 30, 2024

vuvova Dec 30, 2024

vuvova Dec 30, 2024 •

edited

Loading

ParadoxV5 Dec 30, 2024

vuvova Dec 30, 2024

ParadoxV5 left a comment

ParadoxV5 Dec 30, 2024

ParadoxV5 Dec 30, 2024

ParadoxV5 Dec 30, 2024

ParadoxV5 Jan 2, 2025

vuvova Jan 2, 2025

ParadoxV5 Jan 2, 2025

vuvova Jan 3, 2025

ParadoxV5 Jan 3, 2025

vuvova Jan 3, 2025

ParadoxV5 Jan 4, 2025

		if (my_b_printf(file, "SET @@session.sql_mode=%llu%s\n",
		sql_mode, print_event_info->delimiter))

MDEV-35599: Replace (u)llstr() with %llu/d #3724

Are you sure you want to change the base?

MDEV-35599: Replace (u)llstr() with %llu/d #3724

Conversation

ParadoxV5 commented Dec 30, 2024 • edited Loading

Description

What problem is the patch trying to solve?

Original Question: Which llstr(value, buff) can we replace with sprintf(buff, "%lld", value)?

Real motivation

Non-goals

Release Notes

How can this PR be tested?

Basing the PR against the correct MariaDB version

PR quality check

Footnotes

vuvova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vuvova Dec 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ParadoxV5 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MDEV-35599: Replace (`u`)`llstr()` with `%llu`/`d` #3724

MDEV-35599: Replace (`u`)`llstr()` with `%llu`/`d` #3724

ParadoxV5 commented Dec 30, 2024 •

edited

Loading

Original Question: Which `llstr(value, buff)` can we replace with `sprintf(buff, "%lld", value)`?

vuvova Dec 30, 2024 •

edited

Loading