LAPACK test failure with 3.28 on aarch64 #5050

opoplawski · 2025-01-05T19:02:39Z

With the update from 0.3.26 to 0.3.28 in Fedora we're starting to see the following lapack test failure on aarch64 only:

corrupted size vs. prev_size
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0  0xffffb0f7d157 in ???
#1  0xffffb0f7c03f in ???
#2  0xffffb112883f in ???
#3  0xffffb0caef80 in ???
#4  0xffffb0c5b3bf in ???
#5  0xffffb0c45a57 in ???
#6  0xffffb0ca1863 in ???
#7  0xffffb0cba487 in ???
#8  0xffffb0cbaf27 in ???
#9  0xffffb0cbb17f in ???
#10  0xffffb0cbc7eb in ???
#11  0xffffb0cbc99b in ???
#12  0xffffb0cbf8bf in ???
#13  0x479d03 in csyl01_
	at /builddir/build/BUILD/openblas-0.3.28-build/openblas-0.3.28/openmp/lapack-netlib/TESTING/EIG/csyl01.f:308
#14  0x40f373 in cchkec_
	at /builddir/build/BUILD/openblas-0.3.28-build/openblas-0.3.28/openmp/lapack-netlib/TESTING/EIG/cchkec.f:129
#15  0x409f0b in cchkee
	at /builddir/build/BUILD/openblas-0.3.28-build/openblas-0.3.28/openmp/lapack-netlib/TESTING/EIG/cchkee.F:1271
#16  0x402327 in main
	at /builddir/build/BUILD/openblas-0.3.28-build/openblas-0.3.28/openmp/lapack-netlib/TESTING/EIG/cchkee.F:1036
Testing COMPLEX           Eigen-Condition-EIG/xeigtstc < cec.in > cec.out ---- TESTING /builddir/build/BUILD/openblas-0.3.28-build/openblas-0.3.28/openmp/lapack-netlib/TESTING/EIG/xeigtstc... FAILED(/builddir/build/BUILD/openblas-0.3.28-build/openblas-0.3.28/openmp/lapack-netlib/TESTING/EIG/xeigtstc < cec.in > cec.out did not work) !

The text was updated successfully, but these errors were encountered:

opoplawski · 2025-01-05T19:04:34Z

Is there a way to get the lapack-test make target to actually fail with this test failure?

martin-frbg · 2025-01-05T19:20:28Z

On which flavour of aarch64 / which build TARGET do you see this ? (At first glance it looks like a malloc error, does not look familiar)
Not sure how to get the build to stop there, if make does not error out on its own after that abort

opoplawski · 2025-01-05T23:20:07Z

This is the make command:

make -C openmp TARGET=ARMV8 DYNAMIC_ARCH=1 DYNAMIC_OLDER=1 USE_THREAD=1 USE_OPENMP=1 FC=gfortran CC=gcc 'COMMON_OPT=-O2  -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fPIC -fopenmp -pthread' 'FCOMMON_OPT=-O2  -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fPIC -fopenmp -pthread -frecursive' NUM_THREADS=128 LIBPREFIX=libopenblaso INTERFACE64=0 CPP_THREAD_SAFETY_TEST=1:1

What is doubly strange is that the error has only been reproduced on the Fedora koji builders, not on other aarch64 test machines.

martin-frbg · 2025-01-06T08:06:50Z

Is koji running on actual hardware, or virtual machines (that may be resource-limited or not reporting hardware properties like cache sizes correctly) ? The COMPLEX/COMPLEX16 parts of the LAPACK testsuite are a bit more memory hungry than the other tests.

martin-frbg · 2025-01-06T09:44:10Z

Also, would it be possible to add OPENBLAS_VERBOSE=2 to the environment, to see which cpu gets autodetected during the test phase of this DYNAMIC_ARCH build ? (Though I think it is at least as likely that the out-of-bounds access is a bug in the test code OpenBLAS imports from Reference-LAPACK, maybe it is the glibc version in your koji setup that catches this. ISTR there were a few test code fixes in Reference-LAPACK that I copied for 0.3.29 or still need to copy before its release)

opoplawski · 2025-01-06T16:25:22Z

Just a note that 0.3.27 seems to be okay. koji is using VMs, should be fairly decent memory - I'll get a number at some point and try with the verbose.

martin-frbg · 2025-01-06T17:11:56Z

Hmm. The only remotely relevant change in 0.3.28 that I can identify is the addition of vector registers to the clobber list of the cdot/zdot assembly kernel used primarily on AppleM, ThunderX2 and Graviton2 - if anything this should have improved that kernel, certainly not led to memory overruns. (CSYL01 tests CTRSYL which uses very few external functions, most notably CDOT).

sharkcz · 2025-01-07T09:33:49Z

FWIW I have reproduced the issue on our bare-metal Ampere MtSnow system (80 cpus) doing a rawhide mock build for flexiblas.

martin-frbg · 2025-01-07T10:15:48Z

Thanks - that would be NeoverseN1, which (at least in theory) should be rather well tested, though perhaps not with all your additional compiler options. I'll see if I can reproduce this in the GCC Compile Farm

Enchufa2 · 2025-01-07T10:18:22Z

In valgrind @opoplawski got:

==44481== Invalid read of size 4
==44481==    at 0x6182DC4: cgemm_beta_NEOVERSEN1 (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E29A03: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A0FB: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x5E9A4D7: ??? (in /usr/lib64/libopenblaso-r0.3.28.so)
==44481==    by 0x70AEF1F: ??? (in /usr/lib64/libgomp.so.1.0.0)
==44481==    by 0x4F1D187: start_thread (in /usr/lib64/libc.so.6)
==44481==    by 0x4F8729B: thread_start (in /usr/lib64/libc.so.6)
==44481==  Address 0x53cd9e0 is 0 bytes after a block of size 111,504 alloc'd
==44481==    at 0x48854F0: malloc (vg_replace_malloc.c:446)
==44481==    by 0x10C6CB: csyl01_ (csyl01.f:151)
==44481==    by 0x1104B3: cchkec_.constprop.0 (cchkec.f:129)
==44481==    by 0x119D4F: MAIN__ (cchkee.F:1271)
==44481==    by 0x10C327: main (cchkee.F:2553)

I cannot find the reference to cgemm_beta_NEOVERSEN1, but maybe it rings a bell for you.

martin-frbg · 2025-01-07T10:32:04Z

Thanks - my valgrind run has not reported anything interesting so far.
There is no individual CGEMM_BETA for the N1 (in KERNEL.NEOVERSEN1) so it should be using the generic C implementation as defined in the default KERNEL file for arm64. This is kernel/generic/zgemm_beta.c - a fairly trivial unrolled loop that has been unchanged for about 20 years.

Enchufa2 · 2025-01-07T10:58:46Z

Could #4595 and/or #4626 have something to do with this?

martin-frbg · 2025-01-07T11:12:17Z

Seems extremely unlikely to me, if anything the older nrm2 kernels that these reverted to have had much more exposure. I also don't think we have NRM2 anywhere on the call graph of CSYL01 testing CTRSYL (the hit in cgemm_beta makes it look as if the fault is coming from the test code rather than the function under testing, but maybe that was a false positive from valgrind)

martin-frbg · 2025-01-07T13:22:45Z

I cannot reproduce the error with gcc 14.2 and all your build options except the special spec files (cfarm425 runs Debian, my other option would be "Rocky 9.5" but it looks like I'd need to build my own gcc there first to get anything recent). As an unwanted side effect, the installed valgrind 3.2 trips over something in the binary when I use your build options.

sharkcz · 2025-01-07T13:32:20Z

FYI newer gcc version are available via Developer Toolset (not sure if it's still called this way) for RHEL-based distros

martin-frbg · 2025-01-07T13:42:22Z

can I install them as a non-privileged user ?
btw I just happened to get a lapack-test crash in zsyl01 rather than csyl01, an invalid pointer while trying to deallocate the "C" array after the test (line 308), no useful information beyond that in the backtrace. will see if this is reproducible

sharkcz · 2025-01-07T13:44:20Z

can I install them as a non-privileged user ?

they are (usually) in a different repo, but still as rpms, so admin privilege is needed

sharkcz · 2025-01-07T13:50:05Z

I can run more checks if I get some details how as I am not really familiar with openblas (or flexiblas) internals and buildsystems ...

Enchufa2 · 2025-01-07T17:32:27Z

12 seems to be the magical number here. I've reproduced the crash by setting OMP_NUM_THREADS=12 or more, but not with less. This is what valgrind tells me (but note that the test doesn't crash with valgrind):

cat test/lapack-3.12.0/cec.in | valgrind --leak-check=full --show-leak-kinds=all build/test/lapack-3.12.0/EIG/xeigtstc

==2309477== Memcheck, a memory error detector
==2309477== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==2309477== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==2309477== Command: build/test/lapack-3.12.0/EIG/xeigtstc
==2309477== 
 Tests of the Nonsymmetric eigenproblem condition estimation routines
 CTRSYL, CTREXC, CTRSNA, CTRSEN

 Relative machine precision (EPS) =     0.119209E-06
 Safe minimum (SFMIN)             =     0.117549E-37

 Routines pass computational tests if test ratio is less than   20.00


 CEC routines passed the tests of the error exits ( 41 tests done)

==2309477== Thread 10:
==2309477== Invalid read of size 4
==2309477==    at 0x6102DC4: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:105)
==2309477==    by 0x5DA9A03: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f950 is 0 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid read of size 4
==2309477==    at 0x6102DCC: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:107)
==2309477==    by 0x5DA9A03: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f958 is 8 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid write of size 4
==2309477==    at 0x6102DF4: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:126)
==2309477==    by 0x5DA9A03: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f950 is 0 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid write of size 4
==2309477==    at 0x6102DF8: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:128)
==2309477==    by 0x5DA9A03: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f958 is 8 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid read of size 4
==2309477==    at 0x6102DC4: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:105)
==2309477==    by 0x5DB6BC7: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f950 is 0 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid read of size 4
==2309477==    at 0x6102DCC: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:107)
==2309477==    by 0x5DB6BC7: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f958 is 8 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid write of size 4
==2309477==    at 0x6102DF4: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:126)
==2309477==    by 0x5DB6BC7: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f950 is 0 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid write of size 4
==2309477==    at 0x6102DF8: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:128)
==2309477==    by 0x5DB6BC7: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f958 is 8 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid write of size 8
==2309477==    at 0x6102D54: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:69)
==2309477==    by 0x5DA9A03: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f950 is 0 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== Invalid write of size 8
==2309477==    at 0x6102D54: cgemm_beta_NEOVERSEN1 (zgemm_beta.c:69)
==2309477==    by 0x5DB6BC7: inner_thread (level3_thread.c:296)
==2309477==    by 0x5E1A0FB: exec_threads (blas_server_omp.c:382)
==2309477==    by 0x5E1A4D7: exec_blas._omp_fn.1 (blas_server_omp.c:451)
==2309477==    by 0x7034DDF: gomp_thread_start (team.c:129)
==2309477==    by 0x4F8C8E7: start_thread (pthread_create.c:448)
==2309477==    by 0x4FF851B: thread_start (clone.S:79)
==2309477==  Address 0x546f950 is 0 bytes after a block of size 111,504 alloc'd
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x40465B: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 

 All tests for CEC routines passed the threshold (   5966 tests run)


 End of tests
 Total time used =     10335.21 seconds

==2309477== 
==2309477== HEAP SUMMARY:
==2309477==     in use at exit: 172,577 bytes in 51 blocks
==2309477==   total heap usage: 4,809 allocs, 4,758 frees, 2,096,297,782 bytes allocated
==2309477== 
==2309477== Thread 1:
==2309477== 8 bytes in 1 blocks are still reachable in loss record 1 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x7022857: gomp_malloc (alloc.c:38)
==2309477==    by 0x703753B: gomp_init_num_threads (proc.c:91)
==2309477==    by 0x70212C3: initialize_env (env.c:2218)
==2309477==    by 0x4003E6B: call_init (dl-init.c:74)
==2309477==    by 0x4003E6B: call_init (dl-init.c:26)
==2309477==    by 0x4003F83: _dl_init (dl-init.c:121)
==2309477==    by 0x400137B: _dl_catch_exception (dl-catch.c:215)
==2309477==    by 0x400A0FB: dl_open_worker (dl-open.c:804)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477== 
==2309477== 16 bytes in 1 blocks are still reachable in loss record 2 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x4FA58E3: strdup (strdup.c:42)
==2309477==    by 0x4967623: flexiblas_init (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/lib/libflexiblas.so.3.4)
==2309477==    by 0x4003E6B: call_init (dl-init.c:74)
==2309477==    by 0x4003E6B: call_init (dl-init.c:26)
==2309477==    by 0x4003F83: _dl_init (dl-init.c:121)
==2309477==    by 0x401AB77: (below main) (dl-start.S:46)
==2309477== 
==2309477== 32 bytes in 1 blocks are still reachable in loss record 3 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x4012147: UnknownInlinedFun (rtld-malloc.h:56)
==2309477==    by 0x4012147: htab_create (inline-hashtab.h:47)
==2309477==    by 0x4012147: _dl_make_tlsdesc_dynamic (tlsdeschtab.h:94)
==2309477==    by 0x400D48F: elf_machine_rela (dl-machine.h:245)
==2309477==    by 0x400D48F: elf_dynamic_do_Rela (do-rel.h:147)
==2309477==    by 0x400D48F: _dl_relocate_object (dl-reloc.c:301)
==2309477==    by 0x400AB3B: _dl_open_relocate_one_object (dl-open.c:453)
==2309477==    by 0x400AB3B: dl_open_worker_begin (dl-open.c:698)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400142F: _dl_catch_error (dl-catch.c:260)
==2309477==    by 0x4F8799F: _dlerror_run (dlerror.c:138)
==2309477== 
==2309477== 45 bytes in 2 blocks are still reachable in loss record 4 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x401D62B: malloc (rtld-malloc.h:56)
==2309477==    by 0x401D62B: strdup (strdup.c:42)
==2309477==    by 0x4011A57: _dl_load_cache_lookup (dl-cache.c:499)
==2309477==    by 0x400732F: _dl_map_object (dl-load.c:2057)
==2309477==    by 0x40025BF: openaux (dl-deps.c:64)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x4002B33: _dl_map_object_deps (dl-deps.c:232)
==2309477==    by 0x400A9C3: dl_open_worker_begin (dl-open.c:613)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477== 
==2309477== 45 bytes in 2 blocks are still reachable in loss record 5 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x4009DD7: UnknownInlinedFun (rtld-malloc.h:56)
==2309477==    by 0x4009DD7: _dl_new_object (dl-object.c:199)
==2309477==    by 0x4005D73: _dl_map_object_from_fd (dl-load.c:1042)
==2309477==    by 0x40071A3: _dl_map_object (dl-load.c:2190)
==2309477==    by 0x40025BF: openaux (dl-deps.c:64)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x4002B33: _dl_map_object_deps (dl-deps.c:232)
==2309477==    by 0x400A9C3: dl_open_worker_begin (dl-open.c:613)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477== 
==2309477== 96 bytes in 1 blocks are still reachable in loss record 6 of 18
==2309477==    at 0x488C88C: calloc (vg_replace_malloc.c:1675)
==2309477==    by 0x702289F: gomp_malloc_cleared (alloc.c:47)
==2309477==    by 0x702121B: add_initial_icv_to_list (env.c:2159)
==2309477==    by 0x70212E3: initialize_env (env.c:2224)
==2309477==    by 0x4003E6B: call_init (dl-init.c:74)
==2309477==    by 0x4003E6B: call_init (dl-init.c:26)
==2309477==    by 0x4003F83: _dl_init (dl-init.c:121)
==2309477==    by 0x400137B: _dl_catch_exception (dl-catch.c:215)
==2309477==    by 0x400A0FB: dl_open_worker (dl-open.c:804)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477== 
==2309477== 104 bytes in 1 blocks are still reachable in loss record 7 of 18
==2309477==    at 0x488C88C: calloc (vg_replace_malloc.c:1675)
==2309477==    by 0x4012097: UnknownInlinedFun (rtld-malloc.h:44)
==2309477==    by 0x4012097: htab_expand (inline-hashtab.h:139)
==2309477==    by 0x4012097: htab_find_slot (inline-hashtab.h:192)
==2309477==    by 0x4012097: _dl_make_tlsdesc_dynamic (tlsdeschtab.h:102)
==2309477==    by 0x400D48F: elf_machine_rela (dl-machine.h:245)
==2309477==    by 0x400D48F: elf_dynamic_do_Rela (do-rel.h:147)
==2309477==    by 0x400D48F: _dl_relocate_object (dl-reloc.c:301)
==2309477==    by 0x400AB3B: _dl_open_relocate_one_object (dl-open.c:453)
==2309477==    by 0x400AB3B: dl_open_worker_begin (dl-open.c:698)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400142F: _dl_catch_error (dl-catch.c:260)
==2309477==    by 0x4F8799F: _dlerror_run (dlerror.c:138)
==2309477== 
==2309477== 112 bytes in 1 blocks are still reachable in loss record 8 of 18
==2309477==    at 0x488CABC: realloc (vg_replace_malloc.c:1801)
==2309477==    by 0x70228DB: gomp_realloc (alloc.c:56)
==2309477==    by 0x7035CA3: gomp_team_start (team.c:497)
==2309477==    by 0x702CA6F: GOMP_parallel (parallel.c:176)
==2309477==    by 0x5E1A793: exec_blas (blas_server_omp.c:444)
==2309477==    by 0x5DAA747: gemm_driver.isra.0 (level3_thread.c:770)
==2309477==    by 0x5DAA973: cgemm_thread_nn (level3_thread.c:858)
==2309477==    by 0x5C9F2DB: cgemm_ (gemm.c:624)
==2309477==    by 0x404C67: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== 152 bytes in 3 blocks are still reachable in loss record 9 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x401D62B: malloc (rtld-malloc.h:56)
==2309477==    by 0x401D62B: strdup (strdup.c:42)
==2309477==    by 0x400712F: _dl_map_object (dl-load.c:2123)
==2309477==    by 0x400A973: dl_open_worker_begin (dl-open.c:553)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400142F: _dl_catch_error (dl-catch.c:260)
==2309477==    by 0x4F8799F: _dlerror_run (dlerror.c:138)
==2309477== 
==2309477== 152 bytes in 3 blocks are still reachable in loss record 10 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x4009DD7: UnknownInlinedFun (rtld-malloc.h:56)
==2309477==    by 0x4009DD7: _dl_new_object (dl-object.c:199)
==2309477==    by 0x4005D73: _dl_map_object_from_fd (dl-load.c:1042)
==2309477==    by 0x40071A3: _dl_map_object (dl-load.c:2190)
==2309477==    by 0x400A973: dl_open_worker_begin (dl-open.c:553)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400142F: _dl_catch_error (dl-catch.c:260)
==2309477== 
==2309477== 192 bytes in 1 blocks are still reachable in loss record 11 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x7022857: gomp_malloc (alloc.c:38)
==2309477==    by 0x7034F37: gomp_get_thread_pool (pool.h:42)
==2309477==    by 0x7034F37: get_last_team (team.c:156)
==2309477==    by 0x7034F37: gomp_new_team (team.c:175)
==2309477==    by 0x702CA53: GOMP_parallel (parallel.c:176)
==2309477==    by 0x5E1A793: exec_blas (blas_server_omp.c:444)
==2309477==    by 0x5DAA747: gemm_driver.isra.0 (level3_thread.c:770)
==2309477==    by 0x5DAA973: cgemm_thread_nn (level3_thread.c:858)
==2309477==    by 0x5C9F2DB: cgemm_ (gemm.c:624)
==2309477==    by 0x404C67: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== 240 bytes in 10 blocks are still reachable in loss record 12 of 18
==2309477==    at 0x4885550: malloc (vg_replace_malloc.c:446)
==2309477==    by 0x4011FE7: UnknownInlinedFun (rtld-malloc.h:56)
==2309477==    by 0x4011FE7: _dl_make_tlsdesc_dynamic (tlsdeschtab.h:112)
==2309477==    by 0x400D48F: elf_machine_rela (dl-machine.h:245)
==2309477==    by 0x400D48F: elf_dynamic_do_Rela (do-rel.h:147)
==2309477==    by 0x400D48F: _dl_relocate_object (dl-reloc.c:301)
==2309477==    by 0x400AB3B: _dl_open_relocate_one_object (dl-open.c:453)
==2309477==    by 0x400AB3B: dl_open_worker_begin (dl-open.c:698)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400142F: _dl_catch_error (dl-catch.c:260)
==2309477==    by 0x4F8799F: _dlerror_run (dlerror.c:138)
==2309477== 
==2309477== 1,776 bytes in 5 blocks are still reachable in loss record 13 of 18
==2309477==    at 0x488C88C: calloc (vg_replace_malloc.c:1675)
==2309477==    by 0x4010EBF: UnknownInlinedFun (rtld-malloc.h:44)
==2309477==    by 0x4010EBF: _dl_check_map_versions (dl-version.c:280)
==2309477==    by 0x400A9FB: dl_open_worker_begin (dl-open.c:621)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400142F: _dl_catch_error (dl-catch.c:260)
==2309477==    by 0x4F8799F: _dlerror_run (dlerror.c:138)
==2309477==    by 0x4F8807F: dlopen_implementation (dlopen.c:71)
==2309477==    by 0x4F8807F: dlopen@@GLIBC_2.34 (dlopen.c:81)
==2309477== 
==2309477== 2,527 bytes in 2 blocks are still reachable in loss record 14 of 18
==2309477==    at 0x488C88C: calloc (vg_replace_malloc.c:1675)
==2309477==    by 0x4009B7B: UnknownInlinedFun (rtld-malloc.h:44)
==2309477==    by 0x4009B7B: _dl_new_object (dl-object.c:92)
==2309477==    by 0x4005D73: _dl_map_object_from_fd (dl-load.c:1042)
==2309477==    by 0x40071A3: _dl_map_object (dl-load.c:2190)
==2309477==    by 0x40025BF: openaux (dl-deps.c:64)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x4002B33: _dl_map_object_deps (dl-deps.c:232)
==2309477==    by 0x400A9C3: dl_open_worker_begin (dl-open.c:613)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477== 
==2309477== 3,896 bytes in 3 blocks are still reachable in loss record 15 of 18
==2309477==    at 0x488C88C: calloc (vg_replace_malloc.c:1675)
==2309477==    by 0x4009B7B: UnknownInlinedFun (rtld-malloc.h:44)
==2309477==    by 0x4009B7B: _dl_new_object (dl-object.c:92)
==2309477==    by 0x4005D73: _dl_map_object_from_fd (dl-load.c:1042)
==2309477==    by 0x40071A3: _dl_map_object (dl-load.c:2190)
==2309477==    by 0x400A973: dl_open_worker_begin (dl-open.c:553)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A08F: dl_open_worker (dl-open.c:778)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400A51B: _dl_open (dl-open.c:880)
==2309477==    by 0x4F87F87: dlopen_doit (dlopen.c:56)
==2309477==    by 0x4001303: _dl_catch_exception (dl-catch.c:241)
==2309477==    by 0x400142F: _dl_catch_error (dl-catch.c:260)
==2309477== 
==2309477== 4,224 bytes in 12 blocks are possibly lost in loss record 16 of 18
==2309477==    at 0x488C88C: calloc (vg_replace_malloc.c:1675)
==2309477==    by 0x400EAB7: UnknownInlinedFun (rtld-malloc.h:44)
==2309477==    by 0x400EAB7: allocate_dtv (dl-tls.c:395)
==2309477==    by 0x400F5DB: _dl_allocate_tls (dl-tls.c:673)
==2309477==    by 0x4F8D2EF: allocate_stack (allocatestack.c:431)
==2309477==    by 0x4F8D2EF: pthread_create@@GLIBC_2.34 (pthread_create.c:660)
==2309477==    by 0x703525B: gomp_team_start (team.c:859)
==2309477==    by 0x702CA6F: GOMP_parallel (parallel.c:176)
==2309477==    by 0x5E1A793: exec_blas (blas_server_omp.c:444)
==2309477==    by 0x5DAA747: gemm_driver.isra.0 (level3_thread.c:770)
==2309477==    by 0x5DAA973: cgemm_thread_nn (level3_thread.c:858)
==2309477==    by 0x5C9F2DB: cgemm_ (gemm.c:624)
==2309477==    by 0x404C67: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== 4,360 bytes in 1 blocks are still reachable in loss record 17 of 18
==2309477==    at 0x488D124: memalign (vg_replace_malloc.c:2020)
==2309477==    by 0x7022927: gomp_aligned_alloc (alloc.c:71)
==2309477==    by 0x7034E47: gomp_new_team (team.c:181)
==2309477==    by 0x702CA53: GOMP_parallel (parallel.c:176)
==2309477==    by 0x5E1A793: exec_blas (blas_server_omp.c:444)
==2309477==    by 0x5DAA747: gemm_driver.isra.0 (level3_thread.c:770)
==2309477==    by 0x5DAA973: cgemm_thread_nn (level3_thread.c:858)
==2309477==    by 0x5C9F2DB: cgemm_ (gemm.c:624)
==2309477==    by 0x404C67: csyl01_ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x4095C3: cchkec_.constprop.0 (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x414B77: MAIN__ (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477==    by 0x404263: main (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/test/lapack-3.12.0/EIG/xeigtstc)
==2309477== 
==2309477== 154,600 bytes in 1 blocks are still reachable in loss record 18 of 18
==2309477==    at 0x488C88C: calloc (vg_replace_malloc.c:1675)
==2309477==    by 0x4967603: flexiblas_init (in /home/fedora/iucar/flexiblas/flexiblas-3.4.4/build/lib/libflexiblas.so.3.4)
==2309477==    by 0x4003E6B: call_init (dl-init.c:74)
==2309477==    by 0x4003E6B: call_init (dl-init.c:26)
==2309477==    by 0x4003F83: _dl_init (dl-init.c:121)
==2309477==    by 0x401AB77: (below main) (dl-start.S:46)
==2309477== 
==2309477== LEAK SUMMARY:
==2309477==    definitely lost: 0 bytes in 0 blocks
==2309477==    indirectly lost: 0 bytes in 0 blocks
==2309477==      possibly lost: 4,224 bytes in 12 blocks
==2309477==    still reachable: 168,353 bytes in 39 blocks
==2309477==         suppressed: 0 bytes in 0 blocks
==2309477== 
==2309477== For lists of detected and suppressed errors, rerun with: -s
==2309477== ERROR SUMMARY: 641 errors from 11 contexts (suppressed: 0 from 0)

martin-frbg · 2025-01-08T14:01:09Z

Got it down to a segfault in kernel/generic/zgemm_beta.c line 105 during bisect (suggesting that the second set of values being processed two-at-a-time in that part of the unrolled loop is already nonexistent)

martin-frbg · 2025-01-08T14:16:31Z

Bisect puts it down to #4655 "Expanding the scope of 2D thread distribution to improve multithreaded DGEMM performance. (51ab190). I need more time to understand if that PR is actually at fault here (and maybe some of its performance improvement can be salvaged by limiting it to non-complex cases or certain thread counts), or if it only exposes a flaw in (gcc 14's optimization of) the generic C gemm beta code.

martin-frbg · 2025-01-08T15:18:38Z

pragma GCC optimize O0 in zgemm_beta.c does not help, so probably not a gcc14 optimizer bug. Checking the thread redistribution produced by Yamazaki's PR now to see if it does anything interesting at the time of the crash.

martin-frbg · 2025-01-08T18:02:26Z

Looks like the while loop in zgemm_beta.c can cause an additional roundtrip... still testing my "fix" though...

martin-frbg · 2025-01-08T23:21:59Z

Can you please give #5057 a spin ?

opoplawski · 2025-01-09T14:18:40Z

I'm still seeing the crash with that patch.

Enchufa2 · 2025-01-09T14:23:43Z

I'm still seeing the crash with that patch.

Yes, the zeroing loop in that function must be patched too. I did that here and the crash disappears.

Enchufa2 mentioned this issue Jan 7, 2025

aarch64 test failure mpimd-csc/flexiblas#60

Closed

martin-frbg linked a pull request Jan 8, 2025 that will close this issue

Replace while loop in generic C/ZGEMM_BETA to avoid going out of bounds #5057

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LAPACK test failure with 3.28 on aarch64 #5050

LAPACK test failure with 3.28 on aarch64 #5050

opoplawski commented Jan 5, 2025

opoplawski commented Jan 5, 2025

martin-frbg commented Jan 5, 2025

opoplawski commented Jan 5, 2025

martin-frbg commented Jan 6, 2025

martin-frbg commented Jan 6, 2025

opoplawski commented Jan 6, 2025

martin-frbg commented Jan 6, 2025

sharkcz commented Jan 7, 2025

martin-frbg commented Jan 7, 2025

Enchufa2 commented Jan 7, 2025

martin-frbg commented Jan 7, 2025

Enchufa2 commented Jan 7, 2025 •

edited

Loading

martin-frbg commented Jan 7, 2025

martin-frbg commented Jan 7, 2025

sharkcz commented Jan 7, 2025

martin-frbg commented Jan 7, 2025

sharkcz commented Jan 7, 2025

sharkcz commented Jan 7, 2025

Enchufa2 commented Jan 7, 2025

martin-frbg commented Jan 8, 2025

martin-frbg commented Jan 8, 2025

martin-frbg commented Jan 8, 2025

martin-frbg commented Jan 8, 2025

martin-frbg commented Jan 8, 2025

opoplawski commented Jan 9, 2025

Enchufa2 commented Jan 9, 2025

LAPACK test failure with 3.28 on aarch64 #5050

LAPACK test failure with 3.28 on aarch64 #5050

Comments

opoplawski commented Jan 5, 2025

opoplawski commented Jan 5, 2025

martin-frbg commented Jan 5, 2025

opoplawski commented Jan 5, 2025

martin-frbg commented Jan 6, 2025

martin-frbg commented Jan 6, 2025

opoplawski commented Jan 6, 2025

martin-frbg commented Jan 6, 2025

sharkcz commented Jan 7, 2025

martin-frbg commented Jan 7, 2025

Enchufa2 commented Jan 7, 2025

martin-frbg commented Jan 7, 2025

Enchufa2 commented Jan 7, 2025 • edited Loading

martin-frbg commented Jan 7, 2025

martin-frbg commented Jan 7, 2025

sharkcz commented Jan 7, 2025

martin-frbg commented Jan 7, 2025

sharkcz commented Jan 7, 2025

sharkcz commented Jan 7, 2025

Enchufa2 commented Jan 7, 2025

martin-frbg commented Jan 8, 2025

martin-frbg commented Jan 8, 2025

martin-frbg commented Jan 8, 2025

martin-frbg commented Jan 8, 2025

martin-frbg commented Jan 8, 2025

opoplawski commented Jan 9, 2025

Enchufa2 commented Jan 9, 2025

Enchufa2 commented Jan 7, 2025 •

edited

Loading