GSoC 2019 | Week 1, 2

Background

NMatrix is the SciRuby’s fast numerical linear algebra library for Ruby, with dense and sparse matrices. It’s written in C and C++ (and with experimental JRuby support) and with some parts written in Ruby. As the source code is written in such way that data is interchanged between modules written in different languages, it requires converting the data(usually by wrapping and unwrapping) and this causes unnecessary overhead to computation. To overcome this, SciRuby contributors came up with NumRuby which is a re-implementation of NMatrix with modules re-organization such that the API can be easily understood by users and provide them with more richer API.

NumRuby can be viewed as a package for scientific numerical computation in Ruby. It consists of:

  • NMatrix: An API rich N-dimensional array object.
  • Linalg: NumRuby’s linear algebra capabilities.

NumRuby has been implemented such that all of the computation is done only in C and only initial and final data is interchanged between Ruby and C which eliminates unnecessary overheads. To see the speed improvements over previous NMatrix, benchmarks were written and can be found at https://github.com/SciRuby/numruby/tree/master/benchmark/bench.rb. The output of benchmarks was as follows:

#format: [dimensions length, time taken by operation]
{
  :addition => [
    [100, 6.0e-06], 
    [2500, 1.0e-05], 
    [10000, 3.5e-05], 
    [250000, 0.000653], 
    [1000000, 0.00308], 
    [4000000, 0.018201], 
    [9000000, 0.068491], 
    [16000000, 0.048806], 
    [25000000, 0.07285]
  ], 
  :subtraction => [
    [100, 4.0e-06], 
    [2500, 8.0e-06], 
    [10000, 3.2e-05], 
    [250000, 0.000754], 
    [1000000, 0.003138], 
    [4000000, 0.011785], 
    [9000000, 0.065212], 
    [16000000, 0.109191], 
    [25000000, 0.072283]
  ], 
  :mat_mult => [
    [100, 4.3e-05], 
    [2500, 5.4e-05], 
    [10000, 0.006222], 
    [250000, 0.022299], 
    [1000000, 0.122001], 
    [4000000, 0.699966], 
    [9000000, 2.155848], 
    [16000000, 4.650079], 
    [25000000, 10.762429]
  ]
}

The speedups as compared to previous NMatrix are as follows (for 25k elements):

  • Matrix addition: 100x
  • Matrix subtraction: 100x
  • Matrix multiplication(dgemm): 100x

Introduction

I have been working on NumRuby for 2 weeks now. I worked on adding indexing and iteration support. I started off with understanding how current indexing of 2-dimensional matrices is implemented. As I dig deep into the implementation, I saw that the code was written such that it can only work fro 2-d matrices. I designed a rough idea of how this should be to also support arbitrary-dimensional matrices. I also took inspiration from previous Nmatrix code and also went through numo-array code to get any useful insight.

Indexing

During the first week of GSoC coding period, I worked on adding indexing support for arbitrary dimensional matrices. Till now, NMatrix only supported 2-dimensional matrices and the code assumed NMatrix object to be of 2 dimensions. I first needed to modify code to store no. of dimensions and then write indexing conversions to map comma separated indexes to the index of corresponding element in the flat list of elements stored at back-end. This was done pretty easily with the help of strides which are the prefixes for each comma separated index value in the equation to calculate required index in flat list. Once the indexing part was done and working, I wrote the setters and getters for matrix elements. I also wrote some tests for indexing(setting and getting).

void get_stride(nmatrix* nmat, size_t* stride){
  size_t val = 1;
  for(int i = (nmat->ndims)-1; i >= 0; --i){ //using int here instead of size_t
    stride[i] = val;                         //because size_t does not support
    val *= nmat->shape[i];                    //decrement operator
  }
} 

The above function get_stride takes the matrix nmat, calculates stride from nmat->shape[] and store in stride[]. Strides, in other words, are tuple of bytes to step in each dimension when traversing an array. More details on strides can be found here. Stride can simplify element location calculation a lot as one just need to multiply corresponding indices and stride value and add all products to get the location.

size_t get_index(nmatrix* nmat, VALUE* indices){
  size_t index = 0;
  size_t* stride = ALLOC_N(size_t, nmat->ndims);
  get_stride(nmat, stride);
  for(size_t i = 0; i < nmat->ndims; ++i){

    if((size_t)FIX2LONG(indices[i]) >= nmat->shape[i] ||
          (int)FIX2LONG(indices[i]) < 0) {  //index out of bounds
      rb_raise(rb_eIndexError, "IndexError: index is out of bounds.");
    }

    index += ((size_t)FIX2LONG(indices[i]) * stride[i]);
  }
  return index;
}

get_index takes the matrix nmat and indices (input indices tuple) and uses stride from get_stride to calculate the location of element in flat list of elements. It also check for any out of bounds index access and throws exception in case if any index is out of bounds.

Sample indexing usage

[1] pry(main)> n = NMatrix.new [2, 2, 2], [1, 2, 3, 4, 5, 6, -7, 0]
=> 
[2] pry(main)> n[0, 0, 0] #getter using indexing
=> 1.0
[3] pry(main)> n[0, 0, 1]
=> 2.0
[4] pry(main)> n[1, 1, 1]
=> 0.0
[1] pry(main)> n = NMatrix.new [2, 2, 2], [1, 2, 3, 4, 5, 6, -7, 0]
=> 
[2] pry(main)> n[0, 1, 0]
=> 3.0
[3] pry(main)> n[0, 1, 0] = 10 #setter using indexing
=> 10
[4] pry(main)> n[0, 1, 0]
=> 10.0
[5] pry(main)> n.elements #print all elements of n
=> [1.0, 2.0, 10.0, 4.0, 5.0, 6.0, -7.0, 0.0]

Iteration

The second week of GSoC was spent in implementing iterators for NMatrix. NMatrix#each was easy to implement as it was simply yielding each element from the flat list of elements. It seemed to me that other iterators would also be as easy to implement but it wasn’t so as it took quite a lot of time to implement iterators with indexes. Iterators with indexes yield both the current element and the associated comma separated indexes with it. For increment of comma separated indexes after each iteration, I wrote function increment_index which would increment current values to next one.

void increment_state(VALUE* state_array, VALUE* shape_array, size_t ndims) {

  for (size_t index = ndims; index > 0; index--) {
    int curr_dim_index = (int)NUM2INT(state_array[index]);
    int curr_dim_length = (int)NUM2INT(shape_array[index - 1]);

    if (curr_dim_index + 1 == curr_dim_length) {
      curr_dim_index = 0;
      state_array[index] = INT2NUM(curr_dim_index);
    } else {
      curr_dim_index++;
      state_array[index] = INT2NUM(curr_dim_index);
      break;
    }

  }
}

increment_state takes the current state of indices tuple and increments it to next state. For example, a matrix nm has shape [2, 2, 3], then indices tuple [0, 0, 0] will be incremented to [0, 0, 1], [0, 0, 1] to [0, 0, 2] and that to [0, 1, 0] and so on. [0, 1, 2] to [1, 0, 0] and it will increment [1, 1, 2] to [0, 0, 0].

List of iterators implemented:

  • NMatrix#each
  • NMatrix#each_with_indices
  • NMatrix#each_row
  • NMatrix#each_column

Iterators yet to be implemented:

  • NMatrix#each_layer
  • NMatrix#each_rank
  • NMatrix#each_stored_with_indices
  • NMatrix#each_ordered_stored_with_indices
  • NMatrix#map_stored

Initially, I wrote each_row and each_column such that they would only work for 2-dimensional matrices. For each_row, I would take NMatrix object and for each of the iteration, would calculate the index of each of it’s element in the flat list in background then store them in an array and yield this array wrapped as ruby array. For each_column, the methodology was same with just traversing each column and yielding array having elements of each column.

//NMatrix#each_row for 2d-matrices

VALUE nm_each_row(VALUE self) {
  nmatrix* input;
  Data_Get_Struct(self, nmatrix, input);

  VALUE* curr_row = ALLOC_N(VALUE, input->shape[1]);
  for (size_t index = 0; index < input->shape[1]; index++){
    curr_row[index] = INT2NUM(0);
  }

  double* elements = (double*)input->elements;
  for (size_t row_index = 0; row_index < input->shape[0]; row_index++){

    for (size_t index = 0; index < input->shape[1]; index++){
      curr_row[index] = DBL2NUM(elements[(row_index*input->shape[1])+index]);
    }
    //rb_yield(DBL2NUM(elements[row_index]));
    rb_yield(rb_ary_new4(input->shape[1], curr_row));
  }

  return self;
}

Sample #each and #each_with_indices iterators’ usage

[1] pry(main)> n = NMatrix.new [2, 2, 2], [1, 2, 3, 4, 5, 6, -7, 0]
=> 
[2] pry(main)> n.each do |x|  #iterate each element
[2] pry(main)*   puts x
[2] pry(main)* end  
1.0
2.0
3.0
4.0
5.0
6.0
-7.0
0.0
=> 
[1] pry(main)> n = NMatrix.new [2, 2, 2], [1, 2, 3, 4, 5, 6, -7, 0]
=> 
[2] pry(main)> n.each_with_indices do |x, i, j, k| #iterate each element with indices
[2] pry(main)*   puts "#{x} with indices: #{i}, #{j}, #{k}"
[2] pry(main)* end  
1.0 with indices: 0, 0, 0
2.0 with indices: 0, 0, 1
3.0 with indices: 0, 1, 0
4.0 with indices: 0, 1, 1
5.0 with indices: 1, 0, 0
6.0 with indices: 1, 0, 1
-7.0 with indices: 1, 1, 0
0.0 with indices: 1, 1, 1
=> 

Later, I would expand these methods to work for N-dimensional matrices too. This would require implementing a generic method NMatrix#rank which would take in the index of dimension along which traversal is done and index of the required part (slice) along that dimension as it’s argument. To traverse along each row, this would be used as rank(0, idx), where 0 denotes that we want part (slice) along the first dimension(i.e. row) and idx is the index of row to be returned. For example, rank(0, 0) would return the first row, rank(0, 1) would return the second row and so on. To traverse along each column, this would be used as rank(1, idx). So, in order to traverse along kth dimension, use rank(k, idx) and yield the returned object for each of the iteration starting from rank(k, 0) and incrementing k after each iteration. NMatrix#each_layer would also work the same way with the use of rank(2, idx). This would be simplified by creating generic methods NMatrix#rank and NMatrix#each_rank and using these inside NMatrix#row, NMatrix#each_row, NMatrix#column, NMatrix#each_column, NMatrix#layer and NMatrxi#each_layer.

def rank(dimension_idx, rank_idx)
  #calculate the corresponding sub-matrix
  #for given dimension_idx and rank_idx
  #using slicing and return the NMatrix object
  #of that slice
end

def each_rank(dimension_idx)
  (0...self.shape[dimension_idx]).each do |idx|
    yield self.rank(dimension_idx, idx)
  end
end

def row(row_idx)
  self.rank(0, row_idx)
end

def column(column_idx)
  self.rank(1, column_idx)
end

def layer(layer_idx)
  self.rank(2, layer_idx)
end

def each_row()
  self.each_rank(0)
end

def each_column()
  self.each_rank(1)
end

def each_layer()
  self.each_rank(2)
end

NMatrix#rank would internally work by doing slicing of the given NMatrix object with only one index (idx in the examples above) along the given dimension and full ranges along all other dimensions. In other words, for a matrix nm having 4 dimensions, nm.rank(0, 3) would return a slice nm[3, :, :, :] which would be a matrix having 3 dimensions. So, next step is to implement slicing which is quite a task in it’s own.

As indexing takes in comma separated indexes and returns a value on that location, slicing takes in comma separated ranges and returns the collection of elements in the given range and wraps this collection in an NMatrix object with corresponding shape and other NMatrix metadata. The dimensions of returned slice is determined by the number of ranges having length greater than one and length of each dimension would be determined by the length of each range given or if whole range is taken, then length of that dimension in the given matrix. For example, for a matrix nm[4, 4, 4, 4, 4], nm[0:2, :, 2, :, :] would return a slice of shape [3, 4, 4, 4]. The ranges for slicing can also be represented by Range objects, like nm[0..1, 0...2, 0..] and to understand how to specify ranges using Range objects, one can refer to Ruby docs on Range.

Corresponding pull requests can be found here at https://github.com/SciRuby/numruby/pull/19.

For coming weeks, I’ll complete the above remaining iterators and start work on slicing and broadcasting. Feel free to share your thoughts in the comments section. Thanks for reading.