Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC]: achieve ndarray API parity with built-in JavaScript arrays #61

Closed
6 tasks done
rxbryan opened this issue Mar 26, 2024 · 6 comments
Closed
6 tasks done

[RFC]: achieve ndarray API parity with built-in JavaScript arrays #61

rxbryan opened this issue Mar 26, 2024 · 6 comments
Labels
2024 2024 GSoC proposal. rfc Project proposal.

Comments

@rxbryan
Copy link

rxbryan commented Mar 26, 2024

Full name

Bryan Elee Atonye

University status

Yes

University name

University of Port Harcourt

University program

Mathematics and Computer Science

Expected graduation

July

Short biography

My name is Bryan Elee. I am in my final year pursuing a degree in Mathematics and Computer Science. I recently completed my final exam and project defense, hence I'm awaiting graduation in the next couple months.
I possess a strong foundation in various programming languages, with over 5 years of programming experience with C/C++, Python, and JavaScript, honed through academic studies, practical projects and internships. I have previously participated in the Google Summer of Code 2022 under the Metacall Organization and in the summer of Bitcoin program last year working under the Ledger Organization. This experiences solidified my ability to work effectively within open-source communities and collaborate with experienced developers.
I'm interested in machine learning, especially the field of reinforcement learning. I did some work on reinforcement learning last year and I am very excited about the possibilities offered by this technology.

Specific Achievements:

  • Summer of Bitcoin: Developed "Resigner," a general-purpose hot signing service in Python for Ledger Organization. This project demonstrates my ability to tackle complex tasks (Miniscript language, cryptographic functionalities) and deliver real-world solutions. Project Link

  • Google Summer of Code: Refactored the Metacall core library to a plugin architecture. This experience showcases my proficiency in C/C++ development Project Documentation:

Timezone

UTC +1

Contact details

[email protected]

Platform

Linux

Editor

Sublime Text

Programming experience

I began programming before university in 2018. I started out writing shell scripts, moved on to C/C++, then the Python programming language, Javascript and NodeJS. I am self taught in the above languages, I was usually motivated by some project I was developing. I have worked on a couple projects but I am most proud of Resigner.

Resigner is an easy to program hot signing service for miniscript policies. The Resigner countersigns transactions (according to some rules (spending conditions), set in advance in the configuration file, for example “no more than 1 million satoshis per day” before the transaction is broadcast to the bitcoin network. It provides the following features:

  • Enforce preset rules (spending conditions) on transactions.
  • Stealing a user's key is not sufficient to steal funds.
  • The user can recover funds if the service is no longer available, after a given period of time (as specified by the locking script).

It acts as a trusted third party in multiparty transactions enforcing previously agreed conditions

JavaScript experience

I have about 3 years of experience writing javascript programs. I have two published npm packages http-date and http-preconditions. I also have some experience doing backend web development using NodeJS, Express.
I have contributed Javascript to a few open source projects such as

My favourite feature in javascript would be function prototypes. While this pattern has fallen out of favour being replaced by the class syntax, the prototype pattern provides an interesting approach for dynamic inheritance of object properties and behaviour.

My least favourite feature in Javascript is the event loop. While the event loop is responsible for the asynchronous behaviour in javascript, it is also makes writing true multithreaded javascript applications very difficult. Any attempt at optimising javascript code requires deep understanding of the nature of the event loop and how it affects the specific code being optimised. This experience is not readily available.

Node.js experience

My experience with NodeJS is quite extensive. I have some experience modifying NodeJS source code and compiling the Library for embedding purposes. Some of my experience developing node native addons and embedding NodeJS comes from contributing to the development of the node loader in metacall core. This draft PR contains a lot of my work in embedding nodejs. It was used as the base for implementing the feature for exporting classes and objects form nodejs to metacall.
I also have some experience developing web applications using nodejs, express.js. I have also published some npm packages as I have elaborated on in the javascript section

C/Fortran experience

The C programming language is the first language I learnt, the second being C++. It is the language that I have clocked the most years of experience. I used C extensively while paticipating in the summer of code 2022 under metacall and I also worked on some personal projects using C.
Some of my contributions to open source projects using C include
metacall/core#289
metacall/core#270
metacall/core#287
metacall/core#298
Some of these merged PRs include C++ code. But still demonstrates my the requisite skill

Interest in stdlib

My interest in Stdlib is twofold.

  • I am a mathematics undergraduate and I'm interested in projects that allow me to utilise my mathematical knowledge. Hence the fact that Stdlib is involved in implementing linear algebra, statistical and numerical analysis routines piqued my interest.
  • Stdlib is a polygot project, utilising three of my favourite programming languages and technologies C, JavaScript, NodeJS in ways that not only test my skill but improves my understanding of them.

Version control

Yes

Contributions to stdlib

Merged contributions

refactor: update blas/ext/base/sapxsumpw to follow current project conventions
refactor: update blas/ext/base/scusumors to follow current project conventions
refactor: update blas/ext/base/scusumpw to follow current project conventions
refactor: update blas/ext/base/sapx to follow current project conventions

Goals

The goal of this project is to achieve API parity for Stdlib native ndarray with built-in JavaScript Array. Of all the existing JavaScript array method only the at and slice methods exist in ndarray.

Each of the APIs is a standalone package in either the @stdlib/ndarray/base or @stdlib/ndarray directory

Each package would have this file structure

Package_name
             |benchmark
             |docs
                       |types
                        repl.txt 
             |examples
             |lib
             |test
             Readme
             package.json

The following APIs will be implemented during the course of this project:

ndarray slice semantics for representing indices

APIs taking an Index or multiple indices will utilise the slice semantics. We shall use the slice API as it is, hence APIs such as fill, copywithin, splice etc shall take a slice object, array of slice objects or a multislice object.

Dimensionality Reduction

In APIs which it would be suitable to support operating over specific axes, we will be utilising approach used by numpy.
A null axis, (the default) is would perform the operation over all the dimensions of the input ndarray. If this is an array of ints, a reduction is performed on multiple axes, instead of a single axis or all the axes as before.
For example, given a three dimentional ndarray, axis = 0 represent reducing along the depth. 1 represents represent reducing along the row and 2 represents represent reducing along the column

Accessors

ndarray APIs taking a callback such as unary implement optimised accessors for dimensions upto the 10d. We shall use this approach while implementing the APIs requiring callbacks

APIs

/**
* Returns an ndarray.
*
* @param {ndarrayLike} x - input ndarray
* @param {ndarrayLike}y - input ndarray
* @throws {TypeError} first argument must be an ndarray
* @throws {TypeError} second argument must be an ndarray
* @returns {ndarray} ndarray view
*/
concat(x, y)
/**
* Returns an ndarray.
*
* @param {ndarrayLike} x - input ndarray 
* @param {...integer} target - Zero-based index at which to copy the sequence to.
* @param  {...*} s - slice arguments: a MultiSlice instance, an array of slice arguments, or slice arguments as separate arguments.
* @throws {TypeError} first argument must be an ndarray
* @throws {TypeError} index arguments must be integers
* @throws {RangeError} number of index arguments must equal the number of dimensions
* @returns {ndarray} target - ndarray view
*/
copywithin(x, target, s)
/**
* Returns an ndarray.
*
* @param {ndarrayLike} target - input ndarray
* @param {Number} value
* @param  {...*} s - slice arguments: a MultiSlice instance, an array of slice arguments, or slice arguments as separate arguments.
* @throws {TypeError} first argument must be an ndarray
* @throws {TypeError} value and index arguments must be integers
* @throws {RangeError} number of index arguments must equal the number of dimensions
* @returns {ndarray} target - ndarray view
*/
fill( target, value[, s] )

APIs that take a callback

/**
* Returns an ndarray.
*
* @param {ndarrayLike} x - input ndarray
* @param {Callback} fcn - A function to execute for each element in the array.
* @param {...*} axis - null or int or array of ints, optional
* @throws {TypeError} first argument must be an ndarray
* @returns {ndarray} target - ndarray view
*/
filter(x, fcn[, axis])
/**
* Returns an ndarray element.
*
* @param {ndarrayLike} x - input ndarray
* @param {Callback} fcn - unary callback
*  @param {...*} axis - null or int or array of ints, optional
* @throws {TypeError} first argument must be an ndarray
* @returns {*} ndarray element
*/
find(x, fcn[, axis])
/**
* Returns an ndarray element.
*
* @param {ndarrayLike} x - input ndarray
* @param {Callback} fcn - A function to execute for each element in the array.
* @param {...*} axis - null or int or array of ints, optional
* @throws {TypeError} first argument must be an ndarray
* @returns {*} ndarray element
*/
findlast(x, fcn[, axis])
/**
* Executes a provided function once for each array element.
* 
* @param {ndarrayLike} x - input ndarray
* @param {Callback} fcn - A function to execute for each element in the array.
* @throws {TypeError} first argument must be an ndarray
* @returns {void} 
*/
foreach(x, fcn)
/**
* Returns an ndarray.
*
* @param {ndarrayLike} x - input ndarray
* @param {Callback} fcn - A function to execute for each element in the array.
* @returns {ndarray} ndarray 
*/
from(x [, fcn])
/**
* Returns an ndarray.
*
* @param {ndarrayLike} x - input ndarray
* @param {Callback} fcn - A function to execute for each element in the array.
*  @param {...*} axis - null or int or array of ints, optional
* @throws {TypeError} first argument must be an ndarray
* @returns {ndarray} output ndarray 
*/
map(x, fcn[, axis])
/**
* Returns the value that results from running the "reducer" callback function to completion over the entire array..
*
* @param {ndarrayLike} x - input ndarray
* @param {Callback} fcn - A function to execute for each element in the array.
* @param {...*} axis - null or int or array of ints, optional
* @returns {Number}  
*/
reduce(x, fcn [, initialvalue, axis])
/**
* Returns an ndarray.
*
* @param {ndarrayLike} x - input ndarray
* @param {Callback} fcn - A function to execute for each element in the array.
* @param {...*} axis - null or int or array of ints, optional
* @returns {ndarray} output ndarray 
*/
reduceright(x, fcn [, initialvalue, axis])
/**
* Returns an ndarray.
*
* @param {ndarrayLike} x - input ndarray
* @param {Callback} [fcn] - optional callback
* @param {...*} axis - null or int or array of ints, optional
* @returns {ndarray} reference to input array
*/
sort(x, [, fcn, axis])
/**
* Returns an ndarray.
*
* @param {ndarrayLike} x - input ndarray
* @param {Callback} [fcn] - optional callback
* @param {...*} axis - null or int or array of ints, optional
* @returns {ndarray} A new array with the elements sorted in ascending order.
*/
tosorted(x [, fcn, axis])
/**
* Returns a boolean .
*
* we don't have support for `bool` in ndarray, hence we flatten the input ndarray and operate over all the single dimension
* 
* @param {ndarrayLike} target - input ndarray
* @param {Number} searchElement - The value to search for. 
 * @param  {...*} s - slice arguments: a MultiSlice instance, an array of slice arguments, or slice arguments as separate arguments.
* @throws {TypeError} first argument must be an ndarray
* @throws {TypeError} index arguments must be integers
* @throws {RangeError} number of index arguments must equal the number of dimensions
* @returns {boolean} 
*/
includes( x, searchElement [, s])
/**
* Returns a string.
*
* @param {ndarrayLike} x - input ndarray
* @param {String} seperator
* @throws {TypeError} first argument must be an ndarray
* @returns {String} 
*/
join(x, seperator)
/**
* Returns an ndarray.
*
* @param {ndarrayLike} x - input ndarray
* @returns {ndarray} reference to input array
*/
reverse(x)
/**
* Returns an ndarray.
*
* @param {ndarrayLike} x - input ndarray
* @param {...integer} [start] - Zero-based index at which to start changing the array, converted to an integer.
* @param {integer} [deletecount] - An integer indicating the number of elements in the array to remove from start.
* @param {*} [ item1, item2, /* …, */ itemN] - The elements to add to the array, beginning from start.
* @returns {ndarray} reference to input array
*/
splice(x, start [, deleteCount, item1, item2, /* …, */ itemN])
/**
* Returns an ndarray.
*
* @param {ndarrayLike} x - input ndarray
* @returns {ndarray} A new array containing the elements in reversed order.
*/
toreversed(x)
/**
* Returns an ndarray.
*
* @param {ndarrayLike} x - input ndarray
* @param {...integer} [start] - Zero-based index at which to start changing the array, converted to an integer.
* @param {integer} [deletecount] - An integer indicating the number of elements in the array to remove from start.
* @param {*} [ item1, item2, /* …, */ itemN] - The elements to add to the array, beginning from start.
* @throws {TypeError} first argument must be an ndarray
* @returns {ndarray} A new array that consists of all elements before start, item1, item2, , itemN, and all elements after start + deleteCount.
*/
tospliced(x, start [, deleteCount, item1, item2, /* …, */ itemN])
/**
* Returns a string.
*
* @param {ndarrayLike} x - input ndarray
* @throws {TypeError} first argument must be an ndarray
* @returns {String} A string representing the elements of the array.
*/
tostring(x)
/**
* Returns A new iterable iterator object.
*
* @param {ndarray} x - input array
* @returns {Iterator} iterator
*/
values(x)
/**
* Returns an ndarray.
*
* @param {ndarrayLike} x - input ndarray
* @param {Callback} fcn - A function to execute for each element in the array.
* @returns {ndarray} ndarray 
*/
flatmap(x , fcn)
/**
* Returns an ndarray.
*
* @param {ndarrayLike} x - input ndarray
* @param {...*} axis - null or int or array of ints, optional
* @param {integer} [depth]  
* @returns {*} ndarray element
*/
flat(x [,depth, axis])

Why this project?

Ndarrays are foundational to working with the stdlib library. They provide an efficient way to work with multi-dimensional numerical data. This project is a high priority for Stdlib for the fore-mentioned reason. It adds APIs that would be utilised in every package in the library.
The Knowledge of working with multi-dimensional numerical data is a highly valuable skill for data science and machine learning, career paths I intend on pursuing. A significant portion of data science and machine learning involves working with numerical data, often organized in multi-dimensional structures like matrices and tensors. These structures represent complex relationships between features and observations. Understanding how to manipulate, analyze, and interpret this data is very important, this project hence affords me first hand experience with the ndarray object.
I also stand to gain knowledge optimal techniques and patterns for iterating multidimensional arrays, possibly other optimisation techniques that might be used during the course the project.

Qualifications

I have completed the course work for a degree in Mathematics and Computer science. The relevant courses to this project would be Linear algebra, Numerical analysis, Data structures and algorithms.

I am also acquainted with the book Algorithms, 4th Edition by Robert Sedgewick and Kevin Wayne. It helped develop my understanding of both data structures and algorithms.

I am also quite familiar with the emcascript specification. The definitions and implementations of the APIs will be informed by it

Prior art

The at and slice methods exist in ndarray. Various ndarray APIs have also being implemented. They will inform and guide our implementation of the project

Commitment

As stated in background section, I recently completed my final exam and project defense. Hence I'm free from any major commitments and will be able to give a ~40hr/week to this project

Schedule

Assuming a 12 week schedule,

  • Community Bonding Period: Though I have become acquainted with the stdlib community in this last month and have contributed a fair amount in its activities, the project turned out to be very complex and some key details regarding the APIs still have to be ironed out. Hence I will spend this time engaging the mentors in other to complete the implementation details

Each of the APIs to be implemented is standalone, and will not be considered implemented without its benchmarks, tests, documentations and examples. So rather than having a week for documentation, tests and so on...I intended to submit PRs to atleast 3 APIs per week.

  • Week 1 - Week 3: start coding findLast, includes, join, reduceRight, toreversed, tosorted, toSpliced, values

  • Week 4 - Week 6: (midterm): implement filter, find , forEach, includes, splice, copywithin, concat,sort, reverse

  • Week 7 - Week 9: Implement the remaining APIs , map, reduce, some, join, toString,

  • Week 10 - Week 12: because of the complex nature of the project, I intend to leave the last two weeks for review because I’m expecting a lot of reviews before we can get this code merged

Related issues

No response

Checklist

  • I have read and understood the Code of Conduct.
  • I have read and understood the application materials found in this repository.
  • I understand that plagiarism will not be tolerated, and I have authored this application in my own words.
  • I have read and understood the patch requirement which is necessary for my application to be considered for acceptance.
  • The issue name begins with [RFC]: and succinctly describes your proposal.
  • I understand that, in order to apply to be a GSoC contributor, I must submit my final application to https://summerofcode.withgoogle.com/ before the submission deadline.
@rxbryan rxbryan added 2024 2024 GSoC proposal. rfc Project proposal. labels Mar 26, 2024
@Pranavchiku
Copy link
Member

@rxbryan this looks good, thanks for the proposal! I am not so familiar with ndarray APIs so no detailed suggestions, generic ones include explaining implementation of one of the APIs, the necessary files you may need to add, changes, etc. Link the [Idea] issue in Related issues section. Rest all is good.

@rxbryan
Copy link
Author

rxbryan commented Mar 29, 2024

Thanks for the review @Pranavchiku.

Link the [Idea] issue in Related issues section. Rest all is good.

I don't think there's an issue related to this project. Should I create one?

@steff456
Copy link
Collaborator

steff456 commented Apr 1, 2024

@rxbryan thanks for opening this draft proposal!

I think overall the projects looks very good, I will ask you to add all the APIs in a list such that we can create in a future a tracking issue like this one. What I am seeing from this proposal is that you are not leaving room for review cycles, specially at the end of the project. It would be nice to leave room for these interactions and maybe if you are interested in writing a blog post for documenting your journey we can add it towards the final month of this proposal (please note that it is completely optional and it is up to you :) ).

@kgryte
Copy link
Member

kgryte commented Apr 1, 2024

Building on the previous comments, I have a few comments of my own:

  1. Any mutation APIs (e.g., push, shift, unshift, pop) are non-starters. ndarrays have fixed memory. So I am a bit curious why you included these in your proposal. Furthermore, how are these APIs suppose to work in a multi-dimensional context?
  2. Various APIs (e.g., every, some, etc) return a boolean. But this is a bit limiting, as we'd likely want to support operating over specific axes, resulting in an array of lower rank. Your proposal seems to propose flattening the input array. While useful in some contexts, this is limited. Furthermore, ndarrays APIs should not return scalars, but zero-dimensional ndarrays. Additionally, if you are planning to support dimensionality reduction, do note that we have yet to implement a bool dtype. So how will you support?
  3. Various APIs (e.g., indexOf, findIndex, etc) return indices. We only support up to int32/uint32 dtypes, but ndarrays are allowed to have more than 2**32-1 elements. How are you planning to support?
  4. In general, you need to rethink your API proposals to account for dimensionality reduction semantics.
  5. How will the reduction APIs work?
  6. For things like fill, you've specified integer indices. This likely does not make sense, as we're more likely to support slicing semantics. In fact, it is not clear why we'd want to support index arguments at all, given that views can be defined in userland.
  7. In general, based on your API documentation, I am not sure you've actually studied our existing ndarray packages. Many of the API proposals seem LLM generated and not grounded in how we'd design APIs.

@rxbryan
Copy link
Author

rxbryan commented Apr 1, 2024

  1. Any mutation APIs (e.g., push, shift, unshift, pop) are non-starters. ndarrays have fixed memory. So I am a bit curious why you included these in your proposal. Furthermore, how are these APIs suppose to work in a multi-dimensional context?

I assumed here that the user of this APIs (splice, push, shift, unshift, pop) would be more interested in manipulating the dimensions of the ndarray (ie the shape of the ndarray). We could use slice to extract a specific "sub-ndarray" from the n-dimensional array, while using assign, slice-assign to copy to a new ndarray.

we could then assign a placeholder to the original elements we want to effectively shift, or pop. We could do the reverse for shift and push, adding a sub-array

  1. Various APIs (e.g., every, some, etc) return a boolean. But this is a bit limiting, as we'd likely want to support operating over specific axes, resulting in an array of lower rank. Your proposal seems to propose flattening the input array. While useful in some contexts, this is limited.

Yes the proposal assumes that the input array will be viewed as a flat single dimensional array for the includes, every, some.

We only support up to int32/uint32 dtypes

my intention here was to implement the APIs within existing state of ndarray API which now seems a bit short sighted.

For things like fill, you've specified integer indices. This likely does not make sense, as we're more likely to support slicing semantics. In fact, it is not clear why we'd want to support index arguments at all, given that views can be defined in userland.

I would have to agree that the slicing semantics would be more appropriate. I think the proposal suffers alot from conforming to the JavaScript SPEC of the APIs

  1. In general, based on your API documentation, I am not sure you've actually studied our existing ndarray packages. Many of the API proposals seem LLM generated and not grounded in how we'd design APIs.

The API documentation was derived from https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array and the JSdoc from existing APIs in ndarray/base not any LLM. It wasn't the final implementation of the APIs nor was any attempt made to follow the project design principles. It was just meant to visualize my idea of the APIs. I quite sorry if it looks plagarised!

I have to admit that is proposal was over enthusiastic. The attempt to cover the entire JavaScript built-in prototype methods have broadened the scope of this proposal than it should have been. Implementing this proposal as it is would require implementing this project idea #43 and Uint64 dtype support. With just one day to end of project submission I don't think this project can be updated reasonably.
I'm willing to close the proposal if it doesn't quite suite the community's standards.

@kgryte
Copy link
Member

kgryte commented Apr 1, 2024

The project idea is not intended to match exact ECMAScript defined semantics for Array and TypedArray objects. Instead, the goal is provide "analogous" APIs which are conceptually similar to the methods on those objects. The current example is Array.prototype.slice and @stdlib/ndarray/slice. Those ideas are conceptually similar, but the latter is modified to match the semantics and constraints of multi-dimensional arrays. That should be similarly done for other proposed APIs. As a start, you could examine the APIs afforded by NumPy to better get a sense of multi-dimensional array design principles.

@kgryte kgryte closed this as completed Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024 2024 GSoC proposal. rfc Project proposal.
Projects
None yet
Development

No branches or pull requests

4 participants