Replace while loop in generic C/ZGEMM_BETA to avoid going out of bounds #5057

martin-frbg · 2025-01-08T22:19:12Z

Enchufa2 · 2025-01-09T08:39:14Z

Subtle one. I'll give it a spin later.

Did you figure out why this happens to cause problems only in aarch64 with 12 or more threads?

martin-frbg · 2025-01-09T08:59:44Z

aarch64 is the only widely used architecture for which OpenBLAS currently has no optimized GEMM_BETA kernels at all, so
every single cpu will use this C source. And I cannot confirm the 12 threads being special - it seems to be more a question of memory layout in the individual run, the outcome depending on what happens to be sitting in the overrun area. I have not been able to provoke the bug without the extra protector options, and even with them it was something like a 10 to 20 percent chance of failure on any individual run, something that might easily be missed if you're just building and testing once.
But I have not done an in-depth analysis of this yet, all I can really say is that the code survived several hundred re-runs of the LAPACK test with this change.

Enchufa2 · 2025-01-09T09:11:04Z

Could #4917 be a consequence of this?

Enchufa2 · 2025-01-09T09:37:24Z

And BTW... the valgrind output I collected reported invalid writes in line zgemm_beta.c:69 too. Shouldn't the loop in line zgemm_beta.c:62 be replaced in the same way?

martin-frbg · 2025-01-09T09:58:14Z

Not sure about #4917 as that was reproducibly returning sloppy results for existing data (if only for select thread counts), and yes the zeroing loop will need identical treatment if this stage of the PR is correct at all

Enchufa2 · 2025-01-09T10:19:09Z

Ok, I'm building in Copr. I've reproduced the crash by setting OMP_NUM_THREADS in FlexiBLAS' check stage. I'm building OpenBLAS there now with this patch, I just started another one with the loop in zgemm_beta.c:62 patched too, and I'll spin new FlexiBLAS builds on top of them when they finish.

sharkcz · 2025-01-09T11:05:56Z

looks good here, thanks

martin-frbg · 2025-01-09T11:22:06Z

looks good here, thanks

Thanks for testing - this is still quite weird to me as the code must have been like that for at least 15 years, if not 20 - granted aarch64 gave it much more exposure lately

Enchufa2 · 2025-01-09T11:23:20Z

@sharkcz You mean that the issue is not reproduced anymore in the Ampere MtSnow system? Because this patch still crashes for me (see 8492896 on top of 8492496 here). I'm building now with the other loop patched too (8492760), and I'll spin another FlexiBLAS build to see what happens.

sharkcz · 2025-01-09T11:36:00Z

@sharkcz You mean that the issue is not reproduced anymore in the Ampere MtSnow system? Because this patch still crashes for me (see 8492896 on top of 8492496 here). I'm building now with the other loop patched too (8492760), and I'll spin another FlexiBLAS build to see what happens.

correct, I have built updated openblas with this fix (in rawhide mock), installed new rpms into a new rawhide buildroot and successfully rebuilt flexiblas there

Enchufa2 · 2025-01-09T11:39:29Z

correct, I have built updated openblas with this fix (in rawhide mock), installed new rpms into a new rawhide buildroot and successfully rebuilt flexiblas there

With the most current commit in rawhide? Because yesterday I limited the number of threads to 10 in order to avoid the crash and be able to rebuild FlexiBLAS in rawhide, because it was affected by the retirement of ATLAS.

Enchufa2 · 2025-01-09T12:12:07Z

Ok, good news: the new build succeeded where the others failed. So I can confirm that patching the zeroing loop in zgemm_beta.c:62 on top of your changes here does fix the issue.

sharkcz · 2025-01-09T12:20:36Z

correct, I have built updated openblas with this fix (in rawhide mock), installed new rpms into a new rawhide buildroot and successfully rebuilt flexiblas there

With the most current commit in rawhide? Because yesterday I limited the number of threads to 10 in order to avoid the crash and be able to rebuild FlexiBLAS in rawhide, because it was affected by the retirement of ATLAS.

openblas is from rawhide HEAD, but flexiblas from https://src.fedoraproject.org/rpms/flexiblas/c/e10825622fc90f7405e4791062e6b433822a62c8?branch=rawhide (before your workaround)

Replace while loop with for

2891fd8

martin-frbg added this to the 0.3.29 milestone Jan 8, 2025

martin-frbg mentioned this pull request Jan 8, 2025

LAPACK test failure with 3.28 on aarch64 #5050

Open

fix absurd typo

09e75f1

martin-frbg added 2 commits January 9, 2025 14:09

Merge branch 'OpenMathLib:develop' into issue5050

8cc32f5

convert the beta=0 branch to a for loop as well

d91d4fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace while loop in generic C/ZGEMM_BETA to avoid going out of bounds #5057

Replace while loop in generic C/ZGEMM_BETA to avoid going out of bounds #5057

martin-frbg commented Jan 8, 2025

Enchufa2 commented Jan 9, 2025

martin-frbg commented Jan 9, 2025

Enchufa2 commented Jan 9, 2025

Enchufa2 commented Jan 9, 2025 •

edited

Loading

martin-frbg commented Jan 9, 2025

Enchufa2 commented Jan 9, 2025

sharkcz commented Jan 9, 2025

martin-frbg commented Jan 9, 2025

Enchufa2 commented Jan 9, 2025 •

edited

Loading

sharkcz commented Jan 9, 2025

Enchufa2 commented Jan 9, 2025 •

edited

Loading

Enchufa2 commented Jan 9, 2025 •

edited

Loading

sharkcz commented Jan 9, 2025

Replace while loop in generic C/ZGEMM_BETA to avoid going out of bounds #5057

Are you sure you want to change the base?

Replace while loop in generic C/ZGEMM_BETA to avoid going out of bounds #5057

Conversation

martin-frbg commented Jan 8, 2025

Enchufa2 commented Jan 9, 2025

martin-frbg commented Jan 9, 2025

Enchufa2 commented Jan 9, 2025

Enchufa2 commented Jan 9, 2025 • edited Loading

martin-frbg commented Jan 9, 2025

Enchufa2 commented Jan 9, 2025

sharkcz commented Jan 9, 2025

martin-frbg commented Jan 9, 2025

Enchufa2 commented Jan 9, 2025 • edited Loading

sharkcz commented Jan 9, 2025

Enchufa2 commented Jan 9, 2025 • edited Loading

Enchufa2 commented Jan 9, 2025 • edited Loading

sharkcz commented Jan 9, 2025

Enchufa2 commented Jan 9, 2025 •

edited

Loading

Enchufa2 commented Jan 9, 2025 •

edited

Loading

Enchufa2 commented Jan 9, 2025 •

edited

Loading

Enchufa2 commented Jan 9, 2025 •

edited

Loading