-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support non-ndarray computations, cache unit calls, and add slots to DataArray #47
base: master
Are you sure you want to change the base?
Conversation
for name, value in array_dict.items(): | ||
if not isinstance(value, np.ndarray): | ||
array_dict[name] = np.asarray(value) | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a temporary and dirty solution. We could think of a mechanism to control whether arrays must be coerced or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't merge this change as-is, what problem is being solved here and what other solutions are available for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The asarray
function of Numpy seeks to coerce the input array-like storage value
into a ndarray
. This operation could break e.g. the data layout and the memory alignment of value
. In the specific case of Tasmania, value
could be a GT4Py storage, whose low-level details and features are fitted to the target computing architecture and thus must be preserved. The problem can be circumvented by monkey-patching Numpy via the function gt4py.storage.prepare_numpy()
, but this is much GT4Py-specific. We could think of a more organic solution, or just pass value
to DataArray
as it is and let DataArray
perform all type checks (and eventually throw exceptions).
@@ -4,6 +4,7 @@ | |||
|
|||
|
|||
class DataArray(xr.DataArray): | |||
__slots__ = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the implications of setting this? What warning is it suppressing, and what behavior does it cause when you set this to an empty list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is aimed to suppress FutureWarning: xarray subclass DataArray should explicitly define __slots__
. Here is a nice explanation of how __slots__
work.
@@ -43,13 +43,16 @@ def get_numpy_array(data_array, out_dims, dim_lengths): | |||
dict of dim_lengths that will give the length of any missing dims in the | |||
data_array. | |||
""" | |||
if len(data_array.values.shape) == 0 and len(out_dims) == 0: | |||
return data_array.values # special case, 0-dimensional scalar array | |||
if len(data_array.data.shape) == 0 and len(out_dims) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change alters the behavior of this function, which is OK, but the variable names, function name, file name, and docstring need to be updated. For example, I would suggest naming the function something like get_underlying_data
.
The tests in test_get_restore_numpy_array.py
should also be updated to cover cases where data
is not a numpy array (and have similar re-namings).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. I will rename the function and update the tests.
for name, value in array_dict.items(): | ||
if not isinstance(value, np.ndarray): | ||
array_dict[name] = np.asarray(value) | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't merge this change as-is, what problem is being solved here and what other solutions are available for it?
Could you also please update the PR name with a brief description of what these changes do? e.g. "support non-ndarray computations, cache unit calls, add slots to DataArray"? If the name is too long, these can be put into separate PRs. The unit caching for example is mergeable as-is. |
I think the new name is not too long ;) |
A few disparate minor changes.
ndarray
API (in the specific case: GT4Py) by (i) avoiding axes transposition unless necessary, (ii) accessing thedata
rather than thevalues
attribute ofDataArray
, and (iii) avoiding explicit coercion.UnitRegistry
to improve performance.__slots__
class attribute toDataArray
to suppress warning.