Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test failures on Strawberry Perl 5.40.0 #522

Closed
shawnlaffan opened this issue Jan 13, 2025 · 45 comments
Closed

test failures on Strawberry Perl 5.40.0 #522

shawnlaffan opened this issue Jan 13, 2025 · 45 comments

Comments

@shawnlaffan
Copy link
Contributor

Building using Strawberry Perl 5.40.0 PDL edition, using current head (b7980e4) generates the following failures.

Note that this version of Strawberry Perl comes with PDL 2.089.

PDL::IO::GD was split off recently but has not yet been separately installed on this system. Hence the available version is from PDL 2.089. Perhaps the GD tests need to be version-gated to only run with PDL-IO-GD 2.101 or later?

I'm not sure about the other two. I'll add more details if I get a chance to drill into them.

t/compression.t .................. 1/? Error in slice:slice ends out of bounds in pos 0 (end is 65; source dim 0 runs 0 to 64) at lib/PDL/Compression.pm line 153.
        PDL::rice_compress(PDL=SCALAR(0x1d2c95ebea8), 32) called at t/compression.t line 23
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 255 just after 5.
t/compression.t .................. Dubious, test returned 255 (wstat 65280, 0xff00)
All 5 subtests passed
t/flexraw_fortran.t .............. ExtUtils::F77: Unable to guess and/or validate system/compiler configuration
ExtUtils::F77: Will try system=MinGW Compiler=GFortran
t/flexraw_fortran.t .............. 7/? 'C:\Users\user\AppData\Local\Temp\rawpkAs' is not recognized as an internal or external command,
operable program or batch file.
ERROR: code did not create data file C:\Users\user\AppData\Local\Temp\rawpkAs_data
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 2 just after 12.
t/flexraw_fortran.t .............. Dubious, test returned 2 (wstat 512, 0x200)
All 12 subtests passed

t/pic_16bit.t .................... 1/? Can't locate object method "to_rpic" via package "PDL::IO::GD" at C:\git\pdl_repos\pdl\blib\lib/PDL/IO/Pic.pm line 97.
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 25 just after 4.
t/pic_16bit.t .................... Dubious, test returned 25 (wstat 6400, 0x1900)
All 4 subtests passed
@shawnlaffan
Copy link
Contributor Author

the t/flexraw_fortran.t failures are probably also due to the already-installed version. These are skipped when PDL::Slatec is not installed, which is the case for a clean install such as under CI.

@mohawk2
Copy link
Member

mohawk2 commented Jan 13, 2025

The GD tests are part of the PDL::IO::GD distro. Are you running tests from an old PDL, and yet don't have that old PDL available to execute those tests? If so, what could anyone change in new software to make that scenario not horribly fail?

@mohawk2
Copy link
Member

mohawk2 commented Jan 13, 2025

Also, as far as I know, nothing tests GD capability without first making sure it's available. I was pretty careful about that in the recent PDL::IO::Pic updates to enable using GD to read JPEGs so Windows could run the TriD demo. I'm looking at line 97 of P:IO:Pic, and it's only a little below line 89, which puts that behind an eval {require}. Can you help me understand how it got past that?

@shawnlaffan
Copy link
Contributor Author

I'm running from a git checkout from my fork, synched to PDL master.

There are version checks but they need to be more stringent. This seems to get past the flexraw and pic test failures:

diff --git a/lib/PDL/IO/Pic.pm b/lib/PDL/IO/Pic.pm
index d3ac2aac..7a84f941 100644
--- a/lib/PDL/IO/Pic.pm
+++ b/lib/PDL/IO/Pic.pm
@@ -86,7 +86,7 @@ sub init_converter_table {
   $Dflags = '';
   %converter = ();

-  if (eval {require PDL::IO::GD; 1}) {
+  if (eval {require PDL::IO::GD; $PDL::IO::GD::VERSION ge '2.101' ? 1 : 0}) {
     $converter{JPEG} = {referral => {
       put => sub {
         my $pdl = $_[0];
diff --git a/t/flexraw_fortran.t b/t/flexraw_fortran.t
index 7923c257..cbf6d270 100644
--- a/t/flexraw_fortran.t
+++ b/t/flexraw_fortran.t
@@ -26,7 +26,7 @@ $Verbose |= $PDL::Verbose;
 my $null = ' 2>' . File::Spec->devnull;

 my $datalen = length($data);
-eval "use PDL::Slatec";
+eval "use PDL::Slatec 2.096";
 plan skip_all => "Skipped tests as no Slatec" if $@;
 plan skip_all => "temp file path too long for f77 ($datalen chars), skipping all tests"
   if $datalen > 70;

@mohawk2
Copy link
Member

mohawk2 commented Jan 13, 2025

I won't accept those, I'm afraid, without properly understanding how your system can partially load those things and have to be stopped by a version check. Routinely accepting that sort of change would quickly turn into a rickety maintenance nightmare. That won't be happening.

@mohawk2
Copy link
Member

mohawk2 commented Jan 13, 2025

However! It has just occurred to me that the IO::Pic bit of code could do a PDL::IO::GD->can on both subs used in the code section below, which would be the correct test to do, and is maintainable right next to the code that uses it. I would be pleased to accept that :-)

@shawnlaffan
Copy link
Contributor Author

These fix the symptoms, but of course might not be a solution.

To paraphrase the original comment, this is a system which already has PDL::IO::GD and PDL::Slatec installed as part of PDL 2.089. When a current PDL is built it does not contain those libraries, however they are in the perl site/lib dir given they are distributed with Strawberry Perl PDL edition.

When t/flexraw_fortran.t is run it loads the version of PDL::Slatec that was installed with PDL 2.089. For reasons I have not identified yet, this does not play nicely with the tests.

When PDL::IO::Pic is loaded it tries to load PDL::IO::GD, which also comes from the already-installed version 2.089. The to_rpic version was added in PDL 2.093 so is not available. The check could run a can call on the method instead of a version check - that would be more self-documenting. (And I see that you suggested this while I was writing this, so it has my vote).

@shawnlaffan
Copy link
Contributor Author

Updated diff for PDL::IO::Pic below. If it looks good then I'll PR it.

The PDL::Slatec check still needs thought.

diff --git a/lib/PDL/IO/Pic.pm b/lib/PDL/IO/Pic.pm
index d3ac2aac..c36d9697 100644
--- a/lib/PDL/IO/Pic.pm
+++ b/lib/PDL/IO/Pic.pm
@@ -86,7 +86,7 @@ sub init_converter_table {
   $Dflags = '';
   %converter = ();

-  if (eval {require PDL::IO::GD; 1}) {
+  if (eval {require PDL::IO::GD; PDL::IO::GD->can ('to_rpic') && PDL::IO::GD->can ('write_Jpeg')}) {
     $converter{JPEG} = {referral => {
       put => sub {
         my $pdl = $_[0];

@shawnlaffan
Copy link
Contributor Author

FWIW, the t/compression.t failures are unrelated to a previously installed PDL. I get the same failures using a clean Strawberry Perl 5.40 without PDL pre-installed.

@mohawk2
Copy link
Member

mohawk2 commented Jan 13, 2025

I'll make your IO::Pic bit into a commit (with your name on it) now, thank you for the effort.

t/flexraw_fortran.t is a bit of an atavistic throwback, and has been annoying me conceptually for a couple of weeks. It really has no right to be making PDL installation fail because of some failure to compile Fortran stuff. I'm going to remove it as a test, and add the Fortran snippet as a curiosity piece in the IO::FlexRaw docs.

I am somewhat concerned about the Compression failure, and will see if I can repro locally.

@mohawk2
Copy link
Member

mohawk2 commented Jan 13, 2025

As shown by the above commits, the IO::Pic module is now fixed (I repro-ed on my system, and checked the fix worked), and t/flexraw_fortran.t is gone.

Once I have more information on that Compression failure, then I can comment better. Could you add some instrumentation to it, to make it dump the full $len ndarray, and the info of the data array? It's working here on my Perl 5.32 Strawperl.

@shawnlaffan
Copy link
Contributor Author

As shown by the above commits, the IO::Pic module is now fixed (I repro-ed on my system, and checked the fix worked), and t/flexraw_fortran.t is gone.

Thanks for sorting that out. Removing flexraw makes sense given PDL:Slatec is deprecated anyway.

Once I have more information on that Compression failure, then I can comment better. Could you add some instrumentation to it, to make it dump the full $len ndarray, and the info of the data array? It's working here on my Perl 5.32 Strawperl.

The current version of compression.t fails for PDL 2.089 so it might be a long-standing issue exposed by new tests. Or it is failing in new ways. See below.

I extracted the current t/compression.t to a separate file, updated it to look for m51.fits in the same directory, and installed Test::PDL on my SP 5.40 PDL instance. PDL is 2.089, Test::PDL is 0.21 (last version not bundled with PDL core).

I can't see any functional changes to Test::PDL since 0.21 but it's been moved around a bit since it was moved into core PDL so I might have missed things.

There have, however, been changes to Compression.pm and the rice handling code.

ok 1 - right maximum length
Argument "[286 293 302 301 307 304 306 323 312 320 312 314 325 324..." isn't numeric in subroutine entry at comp.t line 11.
ok 2 - no error
not ok 3 - decompress same as original
#   Failed test 'decompress same as original'
#   at comp.t line 12.
#     dimensions do not match in extent
#          got: Long     D [384,384]  (P    ) TOO LONG TO PRINT
#     expected: Long     D [0,384]    (-    ) Empty[0x384]
ok 4 - no error
not ok 5 - decompress correct version gives right answer
#   Failed test 'decompress correct version gives right answer'
#   at comp.t line 19.
#     dimensions do not match in extent
#          got: Byte     D [20,1]     (P    )
# [
#  [126 122 122 128 128 124 124 128 128 128 127 126 126 127 127 128 124 124 123 123]
# ]
#
#     expected: Byte     D [64,1]     (P    )
# [
#  [126 122 122 128 128 124 124 128 128 128 127 126 126 127 127 128 124 124 123 123 122 122 121 121 120 120 119 119 118 118 117 117 118 118 117 116 115 114 113 113 116 115 115 114 114 113 112 112 111 111 110 110 110 110 110 110 109 109 110 110 110 111 111 111]
# ]
Argument "[41]" isn't numeric in subroutine entry at comp.t line 21.
ok 6 - no error
not ok 7 - decompress same as original (2)
#   Failed test 'decompress same as original (2)'
#   at comp.t line 23.
#     dimensions do not match in extent
#          got: Byte     D [0,1]      (-    ) Empty[0x1]
#     expected: Byte     D [64,1]     (P    )
# [
#  [126 122 122 128 128 124 124 128 128 128 127 126 126 127 127 128 124 124 123 123 122 122 121 121 120 120 119 119 118 118 117 117 118 118 117 116 115 114 113 113 116 115 115 114 114 113 112 112 111 111 110 110 110 110 110 110 109 109 110 110 110 111 111 111]
# ]
1..7
# Looks like you failed 3 tests of 7.

@shawnlaffan
Copy link
Contributor Author

More historical info: The compression.t test file also fails for SP 5.36.1 with PDL 2.084. However, the errors differ.

ok 1 - right maximum length
not ok 2 - no error
#   Failed test 'no error'
#   at comp.t line 12.
#          got: 'slice: unexpected '2' in slice specifier (arg 1) at slices.pd line 2463.
#       PDL::slice(PDL=SCALAR(0x1f5719e1bb0), "(0),*[286 293 302 301 307 304 306 323 312 320 312 314 325 324"...) called at Compression.pm line 192
#       PDL::rice_expand(PDL=SCALAR(0x1f5719e1bb0), PDL=SCALAR(0x1f571613570), 384) called at comp.t line 11
#       eval {...} called at comp.t line 11
# '
#     expected: ''
ok 3 - no error
not ok 4 - decompress correct version gives right answer
#   Failed test 'decompress correct version gives right answer'
#   at comp.t line 19.
#     dimensions do not match in extent
#          got: Byte     D [20,1]     (P    )
# [
#  [126 122 122 128 128 124 124 128 128 128 127 126 126 127 127 128 124 124 123 123]
# ]
#
#     expected: Byte     D [64,1]     (P    )
# [
#  [126 122 122 128 128 124 124 128 128 128 127 126 126 127 127 128 124 124 123 123 122 122 121 121 120 120 119 119 118 118 117 117 118 118 117 116 115 114 113 113 116 115 115 114 114 113 112 112 111 111 110 110 110 110 110 110 109 109 110 110 110 111 111 111]
# ]
not ok 5 - no error
#   Failed test 'no error'
#   at comp.t line 22.
#          got: 'slice: unexpected '4' in slice specifier (arg 1) at slices.pd line 2463.
#       PDL::slice(PDL=SCALAR(0x1f571ba90f0), "(0),*[41]") called at Compression.pm line 192
#       PDL::rice_expand(PDL=SCALAR(0x1f571ba90f0), PDL=SCALAR(0x1f571ba7170), 64) called at comp.t line 21
#       eval {...} called at comp.t line 21
# '
#     expected: ''
not ok 6 - decompress same as original (2)
#   Failed test 'decompress same as original (2)'
#   at comp.t line 23.
#     received value is not an ndarray
#          got: undef
#     expected: Byte     D [64,1]     (P    )
# [
#  [126 122 122 128 128 124 124 128 128 128 127 126 126 127 127 128 124 124 123 123 122 122 121 121 120 120 119 119 118 118 117 117 118 118 117 116 115 114 113 113 116 115 115 114 114 113 112 112 111 111 110 110 110 110 110 110 109 109 110 110 110 111 111 111]
# ]
1..6
# Looks like you failed 4 tests of 6.

@shawnlaffan
Copy link
Contributor Author

Sorry, neglected to add the debugging steps.

These are the dump and info of $len using current HEAD:

t\compression.t .. # [286 293 302 301 307 304 306 323 312 320 312 314 325 324 327 324 326 327 331 324 322 336 322 329 326 326 345 339 335 335 339 356 339 341 350 355 354 352 348 346 353 344 351 356 354 359 356 354 357 357 367 362 359 351 361 367 356 350 350 361 353 354 351 341 354 353 355 348 343 349 349 343 344 349 352 346 347 338 347 337 353 346 338 345 336 340 349 342 352 348 347 348 354 359 355 363 357 363 349 358 359 347 350 344 347 346 339 350 331 333 328 338 339 341 335 334 344 337 335 338 338 343 339 345 332 329 338 333 338 345 344 346 338 339 337 347 331 338 336 332 350 345 341 349 350 350 351 355 356 360 358 359 362 359 365 358 354 354 354 360 352 364 356 355 358 357 352 353 360 360 361 354 354 362 351 353 353 352 358 350 353 351 346 365 359 358 356 360 353 351 353 354 364 347 350 348 345 358 352 348 351 355 365 365 362 349 356 355 361 359 363 368 346 347 347 346 361 351 358 354 348 370 356 357 356 373 356 356 354 344 353 355 359 358 360 364 358 355 355 349 357 357 360 354 356 340 343 353 352 361 354 355 352 351 357 349 349 347 347 358 350 342 349 350 339 343 348 347 361 358 359 358 344 350 341 342 345 338 357 350 348 349 357 350 347 342 334 343 344 335 337 337 345 337 340 340 341 349 351 341 340 355 341 345 341 344 350 355 349 346 345 352 339 338 345 344 353 346 342 338 343 334 331 350 346 353 345 339 342 346 352 351 341 346 352 348 338 337 340 350 330 328 319 326 331 327 321 321 321 333 318 325 316 327 321 307 312 310 323 313 318 323 320 330 315 318 319 317 316 299 305 297 307 303 295 291 288 293 285 285 283 279 293 263]
# PDL: Indx D [384]

@shawnlaffan
Copy link
Contributor Author

The failure is in the rice_compress call.

prove -bvm t\compression.t
t\compression.t ..
ok 1 - right maximum length
ok 2 - no error
ok 3 - decompress same as original
ok 4 - no error
ok 5 - decompress correct version gives right answer
Error in slice:slice ends out of bounds in pos 0 (end is 65; source dim 0 runs 0 to 64) at lib/PDL/Compression.pm line 153.
        PDL::rice_compress(PDL=SCALAR(0x19f0ad3bdb0), 32) called at t\compression.t line 26
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 255 just after 5.
Dubious, test returned 255 (wstat 65280, 0xff00)
All 5 subtests passed

Test Summary Report
-------------------
t\compression.t (Wstat: 65280 (exited 255) Tests: 5 Failed: 0)
  Non-zero exit status: 255
  Parse errors: No plan found in TAP output
Files=1, Tests=5,  0 wallclock secs ( 0.06 usr +  0.03 sys =  0.09 CPU)
Result: FAIL

@mohawk2
Copy link
Member

mohawk2 commented Jan 13, 2025

As shown by the above commits, the IO::Pic module is now fixed (I repro-ed on my system, and checked the fix worked), and t/flexraw_fortran.t is gone.

Thanks for sorting that out. Removing flexraw makes sense given PDL:Slatec is deprecated anyway.

To avoid confusion: IO::FlexRaw is useful beyond Fortran, and it's staying. But there's another test for it that doesn't use Fortran.

@mohawk2
Copy link
Member

mohawk2 commented Jan 13, 2025

The newer test failing with older PDL-with-that-Compression is no surprise, because I found and fixed a bug in Compression recently. Can you confirm when you're running the latest test against the latest PDL, that it's not loading the old Compression?

@shawnlaffan
Copy link
Contributor Author

I checked %INC and both PDL and PDL::IO::FITS are coming from the blib dir, as per prove -b.

@mohawk2
Copy link
Member

mohawk2 commented Jan 14, 2025

OK. I was asking about PDL::Compression, not about either of those things?

@shawnlaffan
Copy link
Contributor Author

All are from blib.

@shawnlaffan
Copy link
Contributor Author

Updated diagnostics given it was the second call that failed, not the first.

prove -bvm comp.t
comp.t ..
ok 1 - right maximum length
ok 2 - no error
ok 3 - decompress same as original
ok 4 - no error
ok 5 - decompress correct version gives right answer
#
# [
#  [126 122 122 128 128 124 124 128 128 128 127 126 126 127 127 128 124 124 123 123 122 122 121 121 120 120 119 119 118 118 117 117 118 118 117 116 115 114 113 113 116 115 115 114 114 113 112 112 111 111 110 110 110 110 110 110 109 109 110 110 110 111 111 111]
# ]
# PDL: Byte D [64,1]
Error in slice:slice ends out of bounds in pos 0 (end is 65; source dim 0 runs 0 to 64) at lib/PDL/Compression.pm line 153.
        PDL::rice_compress(PDL=SCALAR(0x29d02fc55d0), 32) called at comp.t line 33
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 255 just after 5.
Dubious, test returned 255 (wstat 65280, 0xff00)
All 5 subtests passed

Test Summary Report
-------------------
comp.t (Wstat: 65280 (exited 255) Tests: 5 Failed: 0)
  Non-zero exit status: 255
  Parse errors: No plan found in TAP output
Files=1, Tests=5,  1 wallclock secs ( 0.09 usr +  0.02 sys =  0.11 CPU)
Result: FAIL

@shawnlaffan
Copy link
Contributor Author

Another data point, but the issue disappears when the failing input ndarray is converted from byte to long type.

For reference:

diff --git a/t/compression.t b/t/compression.t
index 51de80a1..7c12ef79 100755
--- a/t/compression.t
+++ b/t/compression.t
@@ -20,7 +20,13 @@ my $compressed_correct = pdl(byte, '[[126 48 24 0 96 48 14 179 32 54 219 109 147
 my $got = eval { $compressed_correct->rice_expand($compressed_correct->dim(0), 64) };
 is $@, '', 'no error';
 is_pdl $got, $expected, 'decompress correct version gives right answer';
+
+$expected = pdl long, $expected->unpdl;
+diag $expected;
+diag $expected->info;
+
 ($y, $xsize, undef, $len) = $expected->rice_compress(32);
 $got = eval { $y->rice_expand($len, $xsize) };
 is $@, '', 'no error';
 is_pdl $got, $expected, 'decompress same as original (2)';

@mohawk2
Copy link
Member

mohawk2 commented Jan 14, 2025

OK, that feels like progress. Does the same thing happen on different versions of Perl? Which compiler is being used? I can imagine some numerical edge-case being what triggered the bug that the test-change captured and fixed (which I did just by updating the RICE code from CFITSIO), and that being treated differently by different compilers. I'm using Strawperl 5.32's MinGW, it has GCC 8.3.0.

By the way, a slightly less roundabout way of converting that $expected is just to say $expected = $expected->long. Unless that triggers some other conversion type bug, which has been known, but if I don't know, I can't fix it!

@shawnlaffan
Copy link
Contributor Author

OK, that feels like progress. Does the same thing happen on different versions of Perl? Which compiler is being used? I can imagine some numerical edge-case being what triggered the bug that the test-change captured and fixed (which I did just by updating the RICE code from CFITSIO), and that being treated differently by different compilers. I'm using Strawperl 5.32's MinGW, it has GCC 8.3.0.

gcc version 13.2.0 (MinGW-W64 x86_64-ucrt-posix-seh, built by Brecht Sanders, r8)

I can provide more details if needed.

By the way, a slightly less roundabout way of converting that $expected is just to say $expected = $expected->long. Unless that triggers some other conversion type bug, which has been known, but if I don't know, I can't fix it!

I figured there was an easier way. (I dropped into the docs a bit below where that's documented).

Testing further, the code also fails for types short and ushort.

@mohawk2
Copy link
Member

mohawk2 commented Jan 14, 2025

OK, that feels like progress. Does the same thing happen on different versions of Perl? Which compiler is being used? I can imagine some numerical edge-case being what triggered the bug that the test-change captured and fixed (which I did just by updating the RICE code from CFITSIO), and that being treated differently by different compilers. I'm using Strawperl 5.32's MinGW, it has GCC 8.3.0.

gcc version 13.2.0 (MinGW-W64 x86_64-ucrt-posix-seh, built by Brecht Sanders, r8)

I can provide more details if needed.

That's an answer to the second question. How about the first? :-) Especially, do you get the same results with 5.32 as I do (i.e. it passes with latest git master)?

By the way, a slightly less roundabout way of converting that $expected is just to say $expected = $expected->long. Unless that triggers some other conversion type bug, which has been known, but if I don't know, I can't fix it!

I figured there was an easier way. (I dropped into the docs a bit below where that's documented).

Testing further, the code also fails for types short and ushort.

This does sound a bit like some C compiler stuff. Let's see what the results of the previous point indicate.

@shawnlaffan
Copy link
Contributor Author

SP 5.32 passes with git master.

SP 5.36 fails (it uses gcc version 13.1.0 (MinGW-W64 x86_64-msvcrt-posix-seh, built by Brecht Sanders)).

If anyone following along wants to try replicating the issue, Strawberry Perl can be downloaded from https://strawberryperl.com/releases.html. Download and unzip a portable version, and then run portableshell.bat at its top level to get a shell with the various paths and other env vars set.

@shawnlaffan
Copy link
Contributor Author

shawnlaffan commented Jan 14, 2025

Some more data using SP 5.40.

Tests fail using OPTIMIZE set to -Os but pass when it is set to -O1. (Edit: this is just t/compression.t).

SP 5.40 uses -Os with some extra flags. Previous discussion in StrawberryPerl/Perl-Dist-Strawberry#93 and Perl/perl5#20081 (comment).

perl -V | findstr optim
    optimize='-Os -falign-functions -falign-jumps -falign-labels -falign-loops -freorder-blocks -freorder-blocks-algorithm=stc -freorder-blocks-and-partition'

@sisyphus
Copy link
Contributor

sisyphus commented Jan 15, 2025

As regards t/pic_16bit.t on SP-5.40.0, I notice that $can_jpg has changed from a false value (in PDL-2.090) to a true value in PDL-2.098.
If I run t/pic_64bit.t t/pic_16bit.t against PDL-2.090, then all (4) tests pass because the problematic tests are skipped.
Is it correct that, with PDL-2.098, $can_jpg is set to a true value ? ... or is that the bug ?

@mohawk2
Copy link
Member

mohawk2 commented Jan 15, 2025

As regards t/pic_16bit.t on SP-5.40.0, I notice that $can_jpg has changed from a false value (in PDL-2.090) to a true value in PDL-2.098. If I run t/pic_64bit.t t/pic_16bit.t against PDL-2.090, then all (4) tests pass because the problematic tests are skipped. Is it correct that, with PDL-2.098, $can_jpg is set to a true value ? ... or is that the bug ?

The bug was that PDL::IO::Pic was loading PDL::IO::GD, and assuming that it was a very recent one with new functionality that enabled JPEGs to be read and written on platforms without NetPBM, including Strawberry. That has now been fixed. If you want JPEGs readable by PDL post 2.096, you have to install PDL::IO::GD which is now an external module. Until I added that to PDL::IO::GD, Strawberry could not read JPEGs at all.

@mohawk2
Copy link
Member

mohawk2 commented Jan 15, 2025

Some more data using SP 5.40.

Tests fail using OPTIMIZE set to -Os but pass when it is set to -O1. (Edit: this is just t/compression.t).

SP 5.40 uses -Os with some extra flags. Previous discussion in StrawberryPerl/Perl-Dist-Strawberry#93 and Perl/perl5#20081 (comment).

perl -V | findstr optim
    optimize='-Os -falign-functions -falign-jumps -falign-labels -falign-loops -freorder-blocks -freorder-blocks-algorithm=stc -freorder-blocks-and-partition'

Thanks. I just had a look at https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html, and it says -Os is -O2 with some extra stuff, and I see you're adding more extra stuff also. Could you confirm that recompiling lib/PDL/Compression/ricecomp.c with -O1 makes it pass? And then (probably with a script) keep recompiling just that file with additional flags until it fails? I believe this would work for the link/test part (as I just tried it here):

gmake blib\arch\auto\PDL\Compression\Compression.xs.dll & prove -b t\compression.t

That will inform what adjustments are likely to fix things. I wouldn't be surprised at all if a volatile is needed in there somewhere to stop GCC optimisation breaking things, as is its habit.

@shawnlaffan
Copy link
Contributor Author

The -Os set of flags is for the most part a subset of the ones in -O2, plus -finline-functions and a few that reduce code size.

Of those extra ones used for Strawberry Perl, the one I can see missing from the list in the GCC docs is -fprefetch-loop-arrays. So SP is very nearly using -O2.

Will have a go at the compilation. Hopefully it is not an interaction between two or more optimisations.

@mohawk2
Copy link
Member

mohawk2 commented Jan 15, 2025

The -Os set of flags is for the most part a subset of the ones in -O2, plus -finline-functions and a few that reduce code size.

Of those extra ones used for Strawberry Perl, the one I can see missing from the list in the GCC docs is -fprefetch-loop-arrays. So SP is very nearly using -O2.

Will have a go at the compilation. Hopefully it is not an interaction between two or more optimisations.

I am definitely hoping that also! But we can check that as a further step. My thought for how to go quickly is to binary-search, so make a full list of the optimisation flags @flags, then keep halving along that till you find a pair of adjacent indices that breaks/doesn't break. Then take the added flag, compile it with -O1 plus that one, and see if that still breaks (which hopefully will be the case, i.e. no interactions).

If there are interactions, then that first one could be moved to the start, and the binary search could be done again between indices 1 (instead of 0) and $#flags. Computer science, baby!

@shawnlaffan
Copy link
Contributor Author

shawnlaffan commented Jan 16, 2025

It looks like the cause is -f-inline-functions, but only under -Os since it is listed as being part of -O2 in the docs (GCC13 specific docs: https://devdocs.io/gcc~13/optimize-options).

The compression tests pass on my system when using the commands below. It would be good if others could try to replicate.

perl Makefile.PL "OPTIMIZE=-Os -fno-inline-functions"
gmake basic -j8
prove -bvm t\compression.t

Hopefully someone has an idea why this argument triggers the fail under -Os. Possibly it is an interaction with the "further optimizations designed to reduce code size" noted in the docs but unfortunately these are not listed.

Interestingly, all PDL tests pass when using -O2. This was not previously the case, although since then Strawberry Perl has shifted to a more recent GCC using UCRT. Testing with SP 5.36 should shed light there.

For those interested in the process, I ended up running the full Makefile.pl/gmake process with OPTIMIZE=-O1 plus all, the first or the second half of the -O2 set. That made no difference, which left removal of -f-inline-functions from -Os.

@shawnlaffan
Copy link
Contributor Author

Testing with SP 5.36 should shed light there.

It still fails for 5.36, with the default -Os and with "OPTIMIZE=-Os -fno-inline-functions".

Its GCC is gcc version 13.1.0 (MinGW-W64 x86_64-msvcrt-posix-seh, built by Brecht Sanders).

@sisyphus
Copy link
Contributor

perl Makefile.PL "OPTIMIZE=-Os -fno-inline-functions"
gmake basic -j8
prove -bvm t\compression.t

I tried (a copy'n'paste of) that mantra using freshly unpacked PDL-2.098 source, but it still failed for me:

prove -bvm t\compression.t
t\compression.t ..
ok 1 - right maximum length
ok 2 - no error
ok 3 - decompress same as original
ok 4 - no error
ok 5 - decompress correct version gives right answer
Error in slice:slice ends out of bounds in pos 0 (end is 65; source dim 0 runs 0 to 64) at lib/PDL/Compression.pm line 147.
        PDL::rice_compress(PDL=SCALAR(0x23de02abd08), 32) called at t\compression.t line 23
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 255 just after 5.
Dubious, test returned 255 (wstat 65280, 0xff00)
All 5 subtests passed

Test Summary Report
-------------------
t\compression.t (Wstat: 65280 (exited 255) Tests: 5 Failed: 0)
  Non-zero exit status: 255
  Parse errors: No plan found in TAP output
Files=1, Tests=5,  1 wallclock secs ( 0.06 usr +  0.00 sys =  0.06 CPU)
Result: FAIL

That was running SP-5.40.0.1, on Windows 11 :

Perl executable: C:\sp\_64\sp-5.40.0.1-PDL\perl\bin\perl.exe
Perl version   : v5.40.0 / MSWin32-x64-multi-thread
PDL version    : 2.090

I can see the -Os -fno-inline-functions in the gcc commands that are being run, and I haven't spotted any other optimization switches in there.

@shawnlaffan
Copy link
Contributor Author

I tried (a copy'n'paste of) that mantra using freshly unpacked PDL-2.098 source, but it still failed for me:

Drat. Same here.

It might have been an insufficiently clean tree. I was using gmake clean, working in a git checkout. Will now try with gmake realclean.

@shawnlaffan
Copy link
Contributor Author

shawnlaffan commented Jan 16, 2025

I've updated my script to use gmake realclean and to also delete some files that misses (but which were unlikely to be affecting anything anyway - lib/PDL/{Primitive-pchip.o,Ufunc-pp-diffover.c,Ufunc-pp-diffover.o}).

I then ran a few combinations with the following results:

  • -O2 passes.
  • -Os fails.
  • -Os -fno-inline-functions fails
  • -O1 plus all the -O2 optimisations passes.
  • -O1 plus all the -Os optimisations passes.
  • -O1 plus all the -Os and additional Strawberry Perl optimisations passes.

Which suggests something amiss related to how -Os works.

Edit: In the event it it useful, I've uploaded the script as a gist at https://gist.github.com/shawnlaffan/ed9a0c7d7026e6f1d7b5014d309f51e2

@mohawk2
Copy link
Member

mohawk2 commented Jan 16, 2025

@shawnlaffan Thank you for this investigative work. It's difficult for me to see what I could do in PDL to avoid what looks a great deal like a compiler bug.

Do you agree? With this picture of things, what am I as PDL person supposed to do about this?

@shawnlaffan
Copy link
Contributor Author

It could well be a compiler bug but confirmation of that is beyond me.

There are several possibilities I see to get PDL to install "out of the box" on Strawberry Perl. These are not mutually exclusive.

  1. Compile Strawberry Perl with -O2, which will then propagate through to PDL. This might still cause issues with other modules, and also won't fix installation of PDL for users of already-released versions of Strawberry Perl 5.36, 5.38 or 5.40. I'll likely try this anyway.
  2. Compile the rice compression stuff with specific optimisations, e.g. -O1 or -O2. This previous comment indicates it should be possible?
  3. Conditionally use OPTIMIZE=-O2 in Makefile.PL when run using Strawberry Perls of the appropriate version range (SP can be checked for using $Config{myuname} =~ /strawberry-perl/). Or generalise to check for windows and gcc13. This would also need to not override a command line OPTIMIZE= argument.

The third option is a potential maintenance burden, although it would be a "set once" thing if the first option works.

It would be good to try the second option if it is possible. If it works then that part could be conditional on versions of SP rather than all of PDL in option three. Pointers welcome.

@sisyphus
Copy link
Contributor

sisyphus commented Jan 17, 2025

We switched from using -O2 to -Os because -O2 started failing (threads-related) tests. It will be most noteworthy if you find that you can successfully get perl to pass its test suite using -O2 and a recent version of gcc.

ASIDE: "-O2" is fine with unthreaded builds, and also with 32-bit builds.

The third option works for me, and strikes me as being the best solution if the first option is unworkable. I think it should apply to all Windows builds, irrespective of gcc version. If "OPTIMIZE" doesn't include "-Os", I think it will already be -O2 anyway.

Perhaps, instead of building perl with "-Os plus allowable additions" we should be looking at building with "-O2 minus necessary deletions" ?

I don't usually build PDL, but I've just built PDL-2.098 on my MSWin32-x64-multi-thread perl-5.41.7 and all tests pass - no interference needed.
And this is a perl that has the same OPTIMIZE setting as StrawberryPerl-5.40.0 ... so it was a bit surprising that t/compression.t passed it's tests.
Also:

> perl -V:gccversion
gccversion='14.2.0';

I've now just done a second build using perl Makefile.PL "OPTIMIZE=-O2", and it also worked perfectly.

Maybe the optimization bug SP is up against has been fixed in gcc-14.
In any case, I see no issue with specifying OPTIMIZE=-O2for the PDL build.

@shawnlaffan
Copy link
Contributor Author

shawnlaffan commented Jan 17, 2025

Perhaps, instead of building perl with "-Os plus allowable additions" we should be looking at building with "-O2 minus necessary deletions" ?

That makes sense, particularly given there is only one -O2 optimisation SP is not using. I'll try that for the next SP builds. StrawberryPerl/Perl-Dist-Strawberry#231

I suspect you're right that the bug is fixed in gcc-14. I'm reluctant to shift to gcc-14 for SP 5.40, partly for continuity and also given Perl needs to be patched (I think?), but will contemplate a more recent gcc-13. Discussion of that can be done on the relevant Strawberry Perl issue tracker.

@mohawk2
Copy link
Member

mohawk2 commented Jan 17, 2025

Since this looks like it's not a PDL issue as such, please could you close this issue? That's so that you could reopen it if it turns out PDL can be changed. If I close it, you can't.

@shawnlaffan
Copy link
Contributor Author

There still needs to be some resolution of items 2 and 3 from #522 (comment). Both of those would involve changes to PDL but would allow it to build without needing command line arguments.

@mohawk2
Copy link
Member

mohawk2 commented Jan 17, 2025

I saw those. I'm not accepting that as a responsibility that PDL has to maintain. If Strawberry needs to do a local patch to the Makefile.PL and/or adjust Makefile or command line in order to build on a visibly broken compiler, then Strawberry can do that. There isn't anything wrong with the code as distributed.

@shawnlaffan
Copy link
Contributor Author

OK.

The SP build system can set optimisation levels and other parameters so does not need to patch anything for future builds.

If anyone has suggestions for option 2 then I'd still be interested for general knowledge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants