Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are we ndarray yet? #597

Open
14 of 31 tasks
LukeMathWalker opened this issue Mar 17, 2019 · 13 comments
Open
14 of 31 tasks

Are we ndarray yet? #597

LukeMathWalker opened this issue Mar 17, 2019 · 13 comments
Labels
good first issue A good issue to start contributing to ndarray! help wanted

Comments

@LukeMathWalker
Copy link
Member

LukeMathWalker commented Mar 17, 2019

Purpose

The idea behind this collection is to provide an index to easily navigate all currently open ndarray's issues which are immediately actionable.
This is meant to be a good starting point for new contributors (e.g. what should I work on?) and it can also help existing contributors to identify trends and hot areas. I have pinned it using GitHub's new feature, so that it doesn't get lost (and stale).

Given that we have ~100 open issues (and more are opened every day), you are very welcome contributing to this taxonomy effort either commenting on this issue or editing it directly (if you have permissions to do so).
I am only adding to this tracker things I can easily understand/where enough context is provided in the issue - if I left something along the way, feel free to add it and to provide more info on it.

New functionality

Documentation

  • Guidelines on how to use ndarray's types in a public API (Similar to Vec<T> vs &[T] considerations)

Feature parity

Interop / Finer-grained control

Ergonomics

Quality of life

Other

Improvements

Documentation

Error messages / Debugging

Sharp API edges/corner cases

Core

Performance

@LukeMathWalker
Copy link
Member Author

Going through all of these issues, I have starting to think at broader challenges which should probably fall under ndarray's umbrella or are relevant to the project:

  • masked arrays
  • zero-cost interop with other scientific stacks using the Apache Arrow project
  • numpy.einsum equivalent
  • consolidating all currently maintained and mature ndarray-* crates into the rust-ndarray organization, harmonizing interfaces and integrating docs where appropriate

@oracleofnj
Copy link

I've started taking a crack at einsum here. The implementation I have there has multiple issues (performance and otherwise) and is not at all ready for production, but is apparently correct. I'm actively working on improving the implementation. There's a web frontend that uses the crate as a WASM module deployed here.

@LukeMathWalker
Copy link
Member Author

The front-end is what I dreamed I could have when I started to use np.einsum back in the days - quite cool @oracleofnj!
Parsing the output correctly is definitely the first step there - then it comes down to properly optimizing the computation path based on the inputs and the specified contractions. What is your attack plan @oracleofnj?

@oracleofnj
Copy link

After reading through the implementations/documentation in numpy and opt_einsum, I'm writing the base cases to handle a single operand or a pair of operands and then I'll write a function that takes the general case along with a pre-specified path and iterates along the path using the base cases. Last will come an independent function (or functions) to optimize the path given the operand sizes.

@oracleofnj
Copy link

I published a beta version of my crate to crates.io. It still has some issues but it's far enough along that you are welcome to give it a spin. There is a minimal example (and more in the tests/benches) at the crate repo where you should feel free to open any issues - we can move the discussion there.

@TheButlah
Copy link

TheButlah commented Dec 19, 2020

Just came across some missing functionality that might want to be tracked here: #865
Equivalent numpy feature: slicing on a variable number of indices

@lucascolley
Copy link
Contributor

lucascolley commented Aug 11, 2024

If you would like a slightly easier task than implementing all of NumPy, a fantastic start would be to follow the Python array API standard specification (the parts that are relevant to Rust!)

@bionicles
Copy link

bionicles commented Jan 7, 2025

I realize using iterator / loops is likely required here, but could we please spike a project to implement the array API standard @lucascolley mentioned?

Seems like ndarray is missing a various functions from numpy. Just going through my jax primitives module and searching ndarray rustdoc, here's a list of numpy functions / methods I use in Python and cannot locate great alternatives for in the docs for ndarray

  • where
  • atan2
  • maximum
  • minimum
  • clip
  • full_like
  • as_array exists but didn't work for a f64 scalar
  • full (is fill ?)
  • nan_to_num
  • mod
  • ones_like
  • square
  • tanh
  • zeros_like
  • select (ndarray select does something else)
  • broadcast_arrays (a variadic broadcast)
  • argmax
  • count_nonzero
  • power (elementwise raise lhs to rhs)
  • greater (closest is MathCell::gt, presumably on elements ?)
  • less

Any of these have alternatives in ndarray I missed, maybe with different names?

Not too bad really, but quite a lot of basic stuff to re-invent, especially if someone's new to rust and/or doesn't know how ndarray works really well, missing some of these functions could discourage folks from adopting the library.

I'm in a mood where "I don't want to play with Python anymore" and especially with autograd coming to Rust, ndarray could be the future of array programming for ML. Having a more numpy compatible api, would make it way easier to convert pythonistas into rustaceans, right?

Mostly using polars, might stick more with that, but who's down for a cooperative array standard TDD hackathon?

@lucascolley
Copy link
Contributor

could we please spike a project to implement the array API standard

I believe @bluss was experimenting with something along these lines.

@lucascolley
Copy link
Contributor

Actually, I think that's a bit misleading - I remember mention of experimenting with Python bindings, not so much extending the Rust API to new functions.

@akern40
Copy link
Collaborator

akern40 commented Jan 8, 2025

I have some code that starts in on this, in a sort of "Python Array API meets Rust ecosystem" approach. I was starting with implementing the mathematical functions in num-traits (see #1462), but I got derailed working on a trait that could be used to accept scalars, arrays, and vecs/slices. See #1469 for that work. I'll get a PR in soon that implements that math stuff.

I'm coming around to the idea that this functionality should exist, but I do wonder whether it should be in a different crate. It could even be in the ndarray organization, with @bluss and other maintainers' blessing. I can see arguments both for a "batteries included" approach and for a "avoid crate bloat" approach, and I'm not sure which is better. The other possible argument for another crate is that some of those functions, when optimized, may get complicated. For example, should pow specialize low-power arguments to do binary exponentiation and thereby enable SIMD instructions? Those questions could create a complex implementation that may be cumbersome to maintain (and document) in the same codebase as the core memory management and looping functionality.

Also, there's an issue/PR floating around somewhere that discusses some of the math implementations, and talks about the fact that they should probably be lazy. I'm currently of the opinion that waiting for that capability is letting the perfect be the enemy of the good, but others may disagree.

I do think more / more detailed / more accessible documentation would really help with the crate's discoverability, making a "batteries included" approach more feasible. I've got a first cut at that work sitting around as well. I'll try to show it off soon, perhaps I can show an example this weekend or next.

@akern40
Copy link
Collaborator

akern40 commented Jan 8, 2025

Oh sorry wanted to say: I'd be happy to host a public test repo under my name where we can hack away. Most of this shouldn't need to use the non-public API, so we can do it with ndarray as a dependency. Then we can discuss while also making some progress. Let me know if that's of interest.

(P.S. I am particularly fond of an ndarray-related crate/library called numbrs. Maybe that's a bad name. But just wanted to put it out for consideration 😄 )

@bionicles
Copy link

bionicles commented Jan 8, 2025

managed to get em all proptested except broadcast_arrays and asarray (not sure i need em, or how to handle it)

it was fun and the free functions seem quite user-friendly

mod in numpy is "rem" in rust btw but i made a function named "modulo" to avoid conflict with the module reserved word

some highlights

image
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue A good issue to start contributing to ndarray! help wanted
Projects
None yet
Development

No branches or pull requests

6 participants