Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

github actions failing under PDL-2.096 #18

Closed
moocow-the-bovine opened this issue Jan 2, 2025 · 13 comments
Closed

github actions failing under PDL-2.096 #18

moocow-the-bovine opened this issue Jan 2, 2025 · 13 comments

Comments

@moocow-the-bovine
Copy link
Owner

moocow-the-bovine commented Jan 2, 2025

github actions are failing for PDL-CCS commit a9a4882 under PDL-2.096 only

I could not reproduce this locally using the PDL git 2.096 tag PDLPorters/pdl@87bed59.

Problem seems to be changes to the output type & BAD-handling of PDL core's CCS's bandover and borover with respect to PDL core. I think the CCS BAD-handling is right, but output type is defaulting to sbyte rather than longlong.

@mohawk2 any ideas?


for the record, failing tests look like:

test_ufunc(bandover, BAD)
not ok 115 - bandover:missing=BAD:type

#   Failed test 'bandover:missing=BAD:type'
#   at CCS/t/03_ufuncs.t line 103.
#          got: 'sbyte'
#     expected: 'longlong'
# ccs_rc(PDL: SByte D [7])=PDL::CCS::Nd:sbyte(6)
#   pdims:indx(1)=[6]
#   vdims:indx(1)=[0]
#   which:indx(1,6)^T=   
#    [
#     [0 1 2 3 4 5]
#    ]
#   vals:sbyte(7)=[10 1 0 0 0 0 BAD]
#   missing:sbyte(1)=[BAD]
# 
# dense_rc(PDL: LongLong D [6])=[10 1 0 0 0 0]
@moocow-the-bovine
Copy link
Owner Author

... might have to do with the CCS functions' GenericTypes, which gets set to this:

[map {$_->ppsym} grep {$_->integer} PDL::Types::types()]

... have there been (non-deterministic) changes to this order? Or to the way GenericTypes is interpreted if type-demotion is in effect? (I think the tests are just passing in double inputs and relying on auto-magic conversion).

@moocow-the-bovine
Copy link
Owner Author

moocow-the-bovine commented Jan 2, 2025

I could not reproduce this locally using the PDL git 2.096 tag PDLPorters/pdl@87bed59.

can reproduce. found workaround (-> 07bb416). still think there's something weird going on.

@mohawk2
Copy link
Collaborator

mohawk2 commented Jan 2, 2025

Without having checked yet, sbyte is the output type of some PDL mask functions. The bitwise and I believe the logical (and2, or2) functions only deal in integer types, which means if they get passed a double, conversion will happen.

One thing that's changed in PDL 2.096 is that where it's an available type, the output Par(s) dictate the GenericType of the xform. Considerable efforts have been made to make that not cause problems and be backward-compatible, but it's conceivable I've missed something. I'll have a look at what's happening with the CCS commits you mention above and see if there's something you can do in CCS to not be broken by this, or if it's something in PDL that needs fixing.

@mohawk2
Copy link
Collaborator

mohawk2 commented Jan 2, 2025

To add to the above: longlong is the last integer type, so previously if any inputs to e.g. bandover were e.g. double, that being a higher-numbered type (currently 10, if you care) than any integer type, would "win" as selecting the xform's GenType. But because that isn't available, the actual selected type would be the last-given one (here, longlong).

The problem will be some combination of these factors, and I'd imagine that being more finicky about passed-in types will lead to success.

That "use the last-given type" thing is one reason why complex types are opt-in; they are numbered higher than all real types, so if they got selected as the GenType of e.g. a newly-complex-capable asin having been passed a Perl scalar that's an integer (say, 2), would return a mathematically correct complex number, rather than the nan that the user (Craig, in this case) had been expecting. In this case, the fix was to make double be the last-given type.

@moocow-the-bovine
Copy link
Owner Author

That "use the last-given type"

Thanks for the clarification! The GenericTypes I'm specifying works out to:

[qw(A B S U L K N P Q)]

... and if I'm reading it right, then Q aka longlong is the last given type. But I seem to be getting sbyte out (aka A) when I (somewhat nonsensically) pass in a double. Have I misunderstood the heuristic, or should I still be getting a longlong out? Or do I need to specify something else?

@mohawk2
Copy link
Collaborator

mohawk2 commented Jan 2, 2025

Can you say what if any PDL xforms you're using, the whole signature of your op, and the types of all the inputs you're giving it?

@moocow-the-bovine
Copy link
Owner Author

moocow-the-bovine commented Jan 2, 2025

signature (ccs_accum_bor)

    indx ixIn(Ndims,NnzIn);
         nzvalsIn(NnzIn);
         missing();
    indx N();
    indx [o]ixOut(Ndims,NnzOut);
         [o]nzvalsOut(NnzOut);
    indx [o]nOut();

minimum broken example

use PDL;
use PDL::CCS::Ufunc;

PDL::_ccs_accum_bor_int(
  $ixIn=PDL->pdl(indx, [[0]]), # indx
  $nzvalsIn=pdl([65535]),      # double
  $missing=0,                  # scalar --> this seems to be the culprit
  $N=0,                        # scalar
  $ixOut=null,                 # null
  $nzvalsOut=null,             # null
  $nOut=null                   # null
);

die("you're too short for this ride") if ($nzvalsOut->type == sbyte);

xforms

no explicit ones, but I am passing in perl scalars for $missing and $N. The "real" missing value is actually BAD in the failing test, but passing in N=0 causes the PP code to ignore it.

As it turns out, if I pass in $missing=pdl(double, 0) or even BAD itself ($missing=pdl(double, 0)->setvaltobad(0)), then I do get a longlong out.

... that suggests a more robust CCS-internal workaround (--> always pass in a $missing of the correct type), but I'm still surprised by the behavior.

@moocow-the-bovine
Copy link
Owner Author

moocow-the-bovine commented Jan 3, 2025

@mohawk2 I'm closing this as resolved by the CCS-internal workaround, but I still believe that the behavior indicates a bug in PDL core's choice of output type. Feel free to investigate ... or not.

@mohawk2
Copy link
Collaborator

mohawk2 commented Jan 3, 2025

@moocow-the-bovine I'm glad you've got a situation that works with both old and new. However, your feeling of surprise at the situation is an important signal, so I'd like to just examine that a bit more, and especially get your input. I'm going to "comment" this now, in case that happens to catch your attention, but I'll carry on adding to it to complete the thoughts.

First, what happened isn't precisely a choice of output type, but of the xform's "generic type" (the value of $GENERIC()), or (more conveniently) its datatype. That matters only in that the internal processing will be done "in" that datatype (based on inputs that might have been converted), then maybe be converted for each output. That might be a good or bad thing, depending on competing needs for numerical precision and performance.

Secondly, the influence on that of a provided Perl scalar is as follows: those get turned into the smallest-numbered datatype for that class of data: Perl IVs, UVs, NVs, and Math::Complex objects each become one of the relevant PDL datatypes (Math::Complex only becomes cdouble). If you'd given a 0.0 instead of 0, that would have become a float, I think - a non-allowed datatype.

Up till 2.096, if any parameter (input or output) was of a non-allowed type, that forced the xform datatype to instead be the last-given type, as discussed above. As of 2.097, as you have found, it only now does that if no allowed type was given; if one was, it will go with that. I'm inclining as a result of your surprise towards switching that logic back so any non-allowed input again does the forcing, and using the test you've given above (I can find a core equivalent) to make that the permanent behaviour.

What do you think?

@moocow-the-bovine
Copy link
Owner Author

@mohawk2
Thanks for the patience & clear explanation, and apologies for my earlier snarkiness (I do that sometimes).

Up till 2.096 [...] the last-given type
As of 2.097 [...] it only now does that if no allowed type was given

💡 ... so scalar 0 being a (sufficiently small?) IV/UV winds up determining the "native" type of the operation ... that makes a lot of sense (sorry if my terminology is incorrect or my conceptualization is off; I've not delved very deeply into the PDL internals).

If you'd given a 0.0 instead of 0, that would have become [...] a non-allowed datatype.

(symptomatically) confirmed; if I pass in $missing=0.0, I get a longlong out.

inclining as a result of your surprise towards switching that logic back [...] What do you think?

It's complicated; it's pretty clear to me that:

  • my test was doing something unrealistic & stupid by passing a double to an integer-types-only operation,
  • the "clever" CCS-internal kludge for $missing=BAD wrongly used a perl scalar instead of the "proper" logical type of the sparse ndarray (here double)

... so I think that:

  1. I was wrong: the "clever" CCS-internal kludge was too clever and/or not clever enough,
  2. this particular change in PDL core behavior is unlikely to bite anyone in the Real World™, and
  3. the CCS-internal workaround (== preserving the correct type for $missing) will now indeed Do The Right Thing™ (i.e. the same thing that the corresponding PDL core functions would do with the equivalent dense ndarray).

All of that is very CCS-centric ... regarding switching that logic back in PDL core, I don't have a strong feeling one way or the other. Some thoughts:

  • Now that I (believe to) understand it better, I actually really like the "no allowed type given" heuristic ... that sounds to me like it should handle more (and more realistic) cases than this one more flexibly & more gracefully than the "always choose the last given type" heuristic.
  • really naive PDL users might not be super-careful about specifying the types of their ndarrays (I remember that it took me a while before I learned to love the pdl(TYPE, ...) syntax). Naive users are also the most likely to pass in scalars and expect PDL to DWIM ... which with the changed heuristic I guess could lead to unexpected results.
  • A lot of the PMCode (and CCS::Nd wrapper code) I just removed from PDL-CCS involved just this sort of logic to determine which type to pre-allocate for an output PDL with zeroes (one such removal led to this issue). The removed code was klunky and ugly and I'm glad it's gone, but the heuristic (which I modeled on my conceptualization of PDL core's behavior) was basically "choose the maximal type among those which were explicitly passed in". In this case, that was the double from $nzvalsIn ... which presumably would lead to invocation of the "unsupported type" clause in PDL core.
  • Maybe there's a way to prevent (atomic) perl scalars from influencing the type-selection process (not counting as "explicitly passed in" in the above sense)? But I'm guessing that the scalar-to-PDL conversion happens first and that when the type unification is being performed we no longer know which types came from PDLs and which from perl scalars, so that might not be an option. I can also imagine that handing Math::Complex scalars (which should probably count as "explicitly passed in") makes that even trickier.

Going a bit more abstract (and harking back to our earlier discussion of "type variables"), I want to conceptualize the "native type of an operation" in terms of a unification operation (== least-upper-bound over a set of (given) types with respect to a bounded complete partial order). That BCPO can be just a chain (e.g. the sort-order < on PDL::Types), but the question for me here is: "what set of types are we doing the LUB over?" ... null obviously shouldn't count, but should perl scalars? If so, what should they count as?

@mohawk2
Copy link
Collaborator

mohawk2 commented Jan 3, 2025

These matters are all discussed somewhat on PDLPorters/pdl#511, I'd encourage you to take a look there too. That includes points about how to deal with Perl scalars, noted in your second and fourth bullet-points.

Your term "native" type of the specific instance of the operation seems to mean the same things as my "generic type" or "datatype". I'd encourage "datatype", but it's up to you.

I think you're likely to be right that it's only CCS that gets affected by the change of behaviour, but since it is a real change of behaviour, that's not notably beneficial, I'm going to change it back. Thank you for your careful (including some bonus highly-abstract!) thoughts.

Regarding your last point, my aim is to just let actual ndarrays passed in determine the type (including "didn't match any, go to last-given"). Then the Perl scalars would get converted appropriately.

I could imagine allowing overrides of type-selection for a given xform (maybe with a TypeSelectCode, like RedoDims), but I'm quite skeptical that the juice is worth the squeeze.

@mohawk2
Copy link
Collaborator

mohawk2 commented Jan 5, 2025

Interestingly, a naive implementation of restoring that pre-2.096 behaviour (if given any non-available) broke some tests because of the Perl scalar issue: $x **= 1 forces to double even if $x was complex (it got converted to double, which is obviously wrong). I had to adjust it so it's only if the non-available type was greater than the last-given to not cause such breakage.

@moocow-the-bovine
Copy link
Owner Author

only if the non-available type was greater

that makes sense to me in terms of a "least upper bound" datatype for the operation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants