Skip to content

Latest commit

 

History

History
116 lines (76 loc) · 8.94 KB

README.md

File metadata and controls

116 lines (76 loc) · 8.94 KB

Ndarray and Rust

This portion of the tutorial demonstrates how to implement common Matlab patterns in Rust using the ndarray ecosystem of linear algebra tools. Refer back to the project root to learn how to interface Rust + ndarray code with other languages, such as Python.

Quick Start

Refer back to the main quick start guide. In brief review, if you haven't already done so, clone and test out the project with:

git clone https://github.com/benkay86/matlab-ndarray-tutorial.git
cd matlab-ndarray-tutorial
cargo run --bin build_test

Then enter subdirectory for the Rust portion of the tutorial and try running one of the examples located in src/bin.

cd rust
cargo run --bin matrix_creation

Read the following conceptual introduction and then proceed with How To Use This Tutorial.

Introduction

In brief, writing small programs in Rust takes more time and effort than writing them in Matlab. As the programs get larger and more sophisticated, writing and maintaining them in Rust often becomes easier than in Matlab. Programs written in Rust will generally run faster and use less memory than programs written in Matlab.

What is Rust/Ndarray?

Matlab is a domain-specific computer programming language for numerical computation. Rust is a general-purpose programming language suitable for anything from writing an operating system kernel to writing a website.

Ndarray is an ecosystem of libraries for numerical computation in Rust. The name is short for "N-Dimensional Array." Both Matlab and ndarray use the heavily-optimized blas and lapack projects for numerical computation on the backend, and will therefore have similar performance for basic numerical operations (e.g. matrix multiplication).

Rust's ndarray is closely related to another general-purpose programming language library: Python's numpy. If you are already familiar with numpy then you can skip this tutorial and read this documentation on ndarray for numpy users instead.

How are Matlab and ndarray different?

  • Matlab is a proprietary software for numerical computation. Ndarray is open source.
  • In addition to basic linear algebra, Matlab offers all kinds of paid add-on packages for digital signal processing, statistics, machine learning, etc. Ndarray lacks many of these features.
  • The Matlab environment has a graphical user interface with a read-evaluate-print loop (REPL). Ndarray code must be compiled.
  • The Matlab language is designed with a focus on linear algrebra, and not much else. The Rust language is general-purpose, therefore ndarray's syntax is necessarily more verbose.
  • Matlab is an excellent environment for rapidly prototyping numerical programs. Rust has a steeper learning curve, but it excels at memory-intensive operations, parallelism, and integration into production environments.

Why Rust/ndarray?

  • Rust's memory model enforces strict aliasing for Fortran-like performance in vectorized loops.
  • Rust makes it easy to write highly-performant parallel (i.e. multi-threaded) code, even nested parallel-for loops.
  • Rust and ndarray give you precise control over memory allocation for manipulating large data sets.
  • The Rust compiler makes you more productive by safeguarding against common errors before you run your code.
  • Rust supports integration with many other programming languages including C, C++, and Python.

Why not Rust/ndarray?

  • Rust is a relatively young programming language circa 2015. Some core features are not even implemented yet!
  • Ndarray does not have feature parity with numpy or Matlab yet.
  • Rust does not have the popularity/mindshare of Python, C++, or R.
  • Rust code will always be more daunting to newcomers than tweaking a Python or Matlab script.
  • Matlab's IDE is a more productive environment for rapid prototyping.

How To Use This Tutorial

This tutorial does not cover the basics of programming in Rust. The Rust Book is a great place to learn Rust.

Work through the examples in src/bin. As you work, make frequent reference to the ndarray and ndarray-linalg documentation so that you will become skilled at using these information sources. You are encouraged to edit, tweak, and experiment on the examples.

To run each example:

cargo run --bin name_of_example

Consider working the examples beginning in the following order. Once you have mastered these core examples you can proceed with the remaining examples in any order you wish.

  1. Matrix Creation
  2. Clearing Memory
  3. Indexing Matrices
  4. Checking Size
  5. Slicing Matrices
  6. Matrix Transpose (and broadcast)
  7. Assign to Matrices
  8. Concatenate Matrices
  9. Matrix Math
  10. For Loops
  11. Folding

Getting Help

The reader is assumed to be familiar with Matlab (or Octave) and possess basic knowledge of Rust. For general questions about these languages, refer to the reference documentation (Matlab, Rust) or post in the general user forums (Matlab, Rust).

For help with ndarray refer to the ndarray and ndarray-linalg documentation (especially the ArrayBase struct). Rust's ndarray is very similar to Python's numpy, so you may also find relevant solutions involving numpy. If you think you have found a bug, or if something seems very unclear, open an issue on the Github page for ndarray or ndarray-linalg. If you have issues with compiling or linking, try reading the comments in Cargo.toml for an explanation of the various dependencies.

Concepts

Representing Data in Memory

Matlab represents data in 2-dimensional arrays, or matrices. For example:

>> mat2d = [1,2,3; 4,5,6; 7,8,9]

mat2d =

     1     2     3
     4     5     6
     7     8     9

The way in which Matlab and Rust's ndarray store data is actually very similar since both are designed to use a blas linear algebra library behind-the-scenes.

In the case of the 2-dimensional matrix above, we humans conceptualize the data as existing along two dimensions, or axes. Computers, however, do not innately understand this concept. To a computer the data always exist as one contiguous block of memory, regardless of how many dimensions the matrix has! To a computer, the data look like this (diagram taken from Stack Overflow):

diagram of matrix stride

The second row and third column of the data are at coordinates [1,2] in ndarray (which counts up from zero) or (2,3) in Matlab (which counts from one). To access the data at that location, the computer will calculate an offset (in bytes) from the beginning of the array using the stride of each dimension and the size of the data. If the data are double-precision float then the size is 8. In the example above the rows are stored contiguously, so the stride walking across a row from column to column (red arrow) is 1. The columns are stored one after another, so the stride walking down a column from row to row (green arrow) is 3, the number of rows. The offset for the second row, third column [1,2] is then 8 * (1*3 + 2*1) = 40 bytes.

Further reading.