You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I got to talk to @sklam today a bit about our approach here, which was really helpful (thank you!) Here is my summarization.
High/low level extensions
He thought writing as little things with the builder API as possible is the right approach, and what he would do if he could refactor parts of Numba. So having minimal wrappers around lib* logic and then building up the Python api's with that, using numba's high level extension API made sense to him. He even showed me @intrinsic, which makes it a bite easier to do this (see this commit), by combing typing, python function, and lowering all into one function.
One issue is that Numba's high level extension api doesn't support builtin ops like getitem, setitem. We need this so that that Python XND object can have getitem defined that just uses already lowered libxnd functions, so no builder llvm api. But as I was looking at how this high level API works, all it really does is is @jit the function itself and get a pointer to that jitted function and use that as the numba lowered version for whatever you are lowering. So he supported me using that approach to hack together a way to lower builtins from the high level API. heck, maybe it could even be added back to core if it pans out.
What to put in Numba types?
@pearu and I had talked a bit the day before on whether the Numba types for xnd objects should have shape/type information on them (#5). Right now they do not, they are just pointers to memory.
The pro here is that we are very flexible, we can do any indexing/slicing that is possible with the libxnd api. Also, it's much simpler, since the types only exist at runtime as part of the xnd structs, and we have no knowledge of them at compile time. The con is that Numba has less information to do optimizations. With their NumPy implementation, they just pass around a data pointer at runtime, because the infer all the shapes/dtypes at compile time. However, Siu said that this limits what they can compile. If they can't statically infer the shape, then it can't be lowered. With our current approach, it can be.
He also thought our approach made sense, because it is clean and simple. He said right now Numba doesn't have a good way to include both of these approaches, like infer shapes where it can and then fall back if it can't (but still lower).
I also was thinking that while we can do a lot staying inside of xnd containers, like apply gumath functions, index, slice, and call equals and strict_equals, we might still want to get back the inner value sometime, to operate on it Numba. One way to do this without type information (because numba needs to know the type of the returned value), is to explicitly tell numba the type of the thing you are getting out. So you would expose a python function you could call in a jited function, like get_value, that is called with an xnd object and a ndtype, like get_value(x, ndt("int64")), which would extract out an int64 numba type from that xnd type. However, the only trick is we won't know if this is really a valid conversion at compile time, so it could fail at runtime if the user tries to extract a type that really isn't what the xnd object is holding. However, I think this is fine. We need this functionality somewhere, and we could always build a higher level API on top of it that has more compile time checking. It would be a bit like how xnd's python value function works. The @intrinsic decorator Siu showed me could be used for this so that the ndt python object could be accessed directly at type inference time as the value attribute on the Const type. That way, we would register different LLVM extraction for different types as well as different Numba return types.
It wouldn't get us to lowering the .value property on regular python xnd objects exactly, but oh well, we can do what we need for now at a lower level.
Accessing existing C structs
We moved away from explicitly defining the memory layout of existing structs in LLVM (xnd_t, ndt_t, etc), primarily because C unions don't have an obvious llvm struct representation and also because memory layouts could change between systems, depending on padding, size of types, etc. Siu said it was basically would maybe be ok to do basic structs in LLVM but agreed that unions got tricky. So he supported our current approach of having all the getters as C functions, as well as having structs just be just arrays of bytes.
He did suggest, however, that instead of creating a shared library for xnd_structinfo.c we compile to llvm IR (text) and then at runtime, load that LLVM ir, create a module around it (using llmvlite) and link to it with the jitted module. That way numba knows the memory layout of all the structs at compile time and can optimize access and layout (or at least this was his guess, he was curious if there would actually be any difference in performance). However, this depends on being able to call the add_linking_library method on a llvm module after numba has finished with it, which currently isn't accessible by users. So my take is we wait for them to expose this, and don't worry about it, unless we really want to. But hopefully it won't change how we use any of the functions at all, they should have the same names if they are in a shared library or if they are loaded in another llvm module.
Array optimizations
Right now, in numba if you implement an "array-ish" as we are doing with xnd, you don't get any of their re-writing magic with parallelization and combing instructions and pipeline operations. Because you have to actually be an "Array" numba type, or a subclass of one to get this. There is no sense of interfaces in Numba to explicitly say what kinds of things something would need to act enough like an array to get these optimizations.
As @teoliphant and I discussed this tonight, we were wondering if this was a good opportunity to look into creating an array api in python. i.e. can we refactor numba somehow to depend on some array-ish implementation to do it's optimizations? Could this tie into MoA, so numba could use that to do some of the algebra on array operations? Maybe by using SymPy or inspired by it? Can this then tie in at all with goal of scipy on GPU?
Numba is already doing a version of reductions on array operations as it tries to optimize them. Can we be more explicit about this and separate some of that out of Numba itself? It could be a symbolic representation of array operations (like concat, etc) that operate explicitly on indices and shows how it changes. It could look like Index(Concat(A, B), ConcatVec(Vec(i), js)) translates to:
If that makes sense. Then, with a bunch of these expressions combined, you need some kind of math compiler to inline things we know at compile time. But then the resulting reduced expression could be lowered with Numba. Siu was saying maybe this sort of thing could just emit LLVM directly, like translate this to LLVM, which could make sense. Then, the magic part would be using that inside Numba, to rewrite existing numpy expressions, using this algebra, if we specify rules for them, and replacing some of numbas array optimizations with that.
The premise is that Numba has a way of compiling array expressions to make them fast already, so any kind of interface/algebra we come up with for arrays should be able to be used to do the same kinds of optimizations numba is doing, or else it's not a good enough abstraction.
The text was updated successfully, but these errors were encountered:
Related to array optimizations, I just saw this library in haskell we probably learn from https://github.com/lehins/massiv, since they have done some thinking about what constitutes the different types of arrays. Corresponding slideshow about it. Also pythran, a python (really numpy) to c++ compiler supposedly generates very fast code and so probably has some useful ways of compiling array expressions in it.
I got to talk to @sklam today a bit about our approach here, which was really helpful (thank you!) Here is my summarization.
High/low level extensions
He thought writing as little things with the builder API as possible is the right approach, and what he would do if he could refactor parts of Numba. So having minimal wrappers around lib* logic and then building up the Python api's with that, using numba's high level extension API made sense to him. He even showed me
@intrinsic
, which makes it a bite easier to do this (see this commit), by combing typing, python function, and lowering all into one function.One issue is that Numba's high level extension api doesn't support builtin ops like
getitem
,setitem
. We need this so that that Python XND object can have getitem defined that just uses already lowered libxnd functions, so no builder llvm api. But as I was looking at how this high level API works, all it really does is is@jit
the function itself and get a pointer to that jitted function and use that as the numba lowered version for whatever you are lowering. So he supported me using that approach to hack together a way to lower builtins from the high level API. heck, maybe it could even be added back to core if it pans out.What to put in Numba types?
@pearu and I had talked a bit the day before on whether the Numba types for xnd objects should have shape/type information on them (#5). Right now they do not, they are just pointers to memory.
The pro here is that we are very flexible, we can do any indexing/slicing that is possible with the libxnd api. Also, it's much simpler, since the types only exist at runtime as part of the xnd structs, and we have no knowledge of them at compile time. The con is that Numba has less information to do optimizations. With their NumPy implementation, they just pass around a data pointer at runtime, because the infer all the shapes/dtypes at compile time. However, Siu said that this limits what they can compile. If they can't statically infer the shape, then it can't be lowered. With our current approach, it can be.
He also thought our approach made sense, because it is clean and simple. He said right now Numba doesn't have a good way to include both of these approaches, like infer shapes where it can and then fall back if it can't (but still lower).
I also was thinking that while we can do a lot staying inside of xnd containers, like apply gumath functions, index, slice, and call
equals
andstrict_equals
, we might still want to get back the inner value sometime, to operate on it Numba. One way to do this without type information (because numba needs to know the type of the returned value), is to explicitly tell numba the type of the thing you are getting out. So you would expose a python function you could call in ajit
ed function, likeget_value
, that is called with an xnd object and a ndtype, likeget_value(x, ndt("int64"))
, which would extract out an int64 numba type from that xnd type. However, the only trick is we won't know if this is really a valid conversion at compile time, so it could fail at runtime if the user tries to extract a type that really isn't what the xnd object is holding. However, I think this is fine. We need this functionality somewhere, and we could always build a higher level API on top of it that has more compile time checking. It would be a bit like how xnd's python value function works. The@intrinsic
decorator Siu showed me could be used for this so that the ndt python object could be accessed directly at type inference time as thevalue
attribute on theConst
type. That way, we would register different LLVM extraction for different types as well as different Numba return types.It wouldn't get us to lowering the
.value
property on regular python xnd objects exactly, but oh well, we can do what we need for now at a lower level.Accessing existing C structs
We moved away from explicitly defining the memory layout of existing structs in LLVM (
xnd_t
,ndt_t
, etc), primarily because C unions don't have an obvious llvm struct representation and also because memory layouts could change between systems, depending on padding, size of types, etc. Siu said it was basically would maybe be ok to do basic structs in LLVM but agreed that unions got tricky. So he supported our current approach of having all the getters as C functions, as well as having structs just be just arrays of bytes.He did suggest, however, that instead of creating a shared library for
xnd_structinfo.c
we compile to llvm IR (text) and then at runtime, load that LLVM ir, create a module around it (using llmvlite) and link to it with the jitted module. That way numba knows the memory layout of all the structs at compile time and can optimize access and layout (or at least this was his guess, he was curious if there would actually be any difference in performance). However, this depends on being able to call theadd_linking_library
method on a llvm module after numba has finished with it, which currently isn't accessible by users. So my take is we wait for them to expose this, and don't worry about it, unless we really want to. But hopefully it won't change how we use any of the functions at all, they should have the same names if they are in a shared library or if they are loaded in another llvm module.Array optimizations
Right now, in numba if you implement an "array-ish" as we are doing with xnd, you don't get any of their re-writing magic with parallelization and combing instructions and pipeline operations. Because you have to actually be an "Array" numba type, or a subclass of one to get this. There is no sense of interfaces in Numba to explicitly say what kinds of things something would need to act enough like an array to get these optimizations.
As @teoliphant and I discussed this tonight, we were wondering if this was a good opportunity to look into creating an array api in python. i.e. can we refactor numba somehow to depend on some array-ish implementation to do it's optimizations? Could this tie into MoA, so numba could use that to do some of the algebra on array operations? Maybe by using SymPy or inspired by it? Can this then tie in at all with goal of scipy on GPU?
Numba is already doing a version of reductions on array operations as it tries to optimize them. Can we be more explicit about this and separate some of that out of Numba itself? It could be a symbolic representation of array operations (like concat, etc) that operate explicitly on indices and shows how it changes. It could look like
Index(Concat(A, B), ConcatVec(Vec(i), js))
translates to:If that makes sense. Then, with a bunch of these expressions combined, you need some kind of math compiler to inline things we know at compile time. But then the resulting reduced expression could be lowered with Numba. Siu was saying maybe this sort of thing could just emit LLVM directly, like translate this to LLVM, which could make sense. Then, the magic part would be using that inside Numba, to rewrite existing numpy expressions, using this algebra, if we specify rules for them, and replacing some of numbas array optimizations with that.
The premise is that Numba has a way of compiling array expressions to make them fast already, so any kind of interface/algebra we come up with for arrays should be able to be used to do the same kinds of optimizations numba is doing, or else it's not a good enough abstraction.
The text was updated successfully, but these errors were encountered: