GSoC 2019 | Week 1, 2
Background
NMatrix is the SciRuby’s fast numerical linear algebra library for Ruby, with dense and sparse matrices. It’s written in C and C++ (and with experimental JRuby support) and with some parts written in Ruby. As the source code is written in such way that data is interchanged between modules written in different languages, it requires converting the data(usually by wrapping and unwrapping) and this causes unnecessary overhead to computation. To overcome this, SciRuby contributors came up with NumRuby which is a re-implementation of NMatrix with modules re-organization such that the API can be easily understood by users and provide them with more richer API.
NumRuby can be viewed as a package for scientific numerical computation in Ruby. It consists of:
- NMatrix: An API rich N-dimensional array object.
- Linalg: NumRuby’s linear algebra capabilities.
NumRuby has been implemented such that all of the computation is done only in C and only initial and final data is interchanged between Ruby and C which eliminates unnecessary overheads. To see the speed improvements over previous NMatrix, benchmarks were written and can be found at https://github.com/SciRuby/numruby/tree/master/benchmark/bench.rb
. The output of benchmarks was as follows:
#format: [dimensions length, time taken by operation]
{
:addition => [
[100, 6.0e-06],
[2500, 1.0e-05],
[10000, 3.5e-05],
[250000, 0.000653],
[1000000, 0.00308],
[4000000, 0.018201],
[9000000, 0.068491],
[16000000, 0.048806],
[25000000, 0.07285]
],
:subtraction => [
[100, 4.0e-06],
[2500, 8.0e-06],
[10000, 3.2e-05],
[250000, 0.000754],
[1000000, 0.003138],
[4000000, 0.011785],
[9000000, 0.065212],
[16000000, 0.109191],
[25000000, 0.072283]
],
:mat_mult => [
[100, 4.3e-05],
[2500, 5.4e-05],
[10000, 0.006222],
[250000, 0.022299],
[1000000, 0.122001],
[4000000, 0.699966],
[9000000, 2.155848],
[16000000, 4.650079],
[25000000, 10.762429]
]
}
The speedups as compared to previous NMatrix are as follows (for 25k elements):
- Matrix addition: 100x
- Matrix subtraction: 100x
- Matrix multiplication(dgemm): 100x
Introduction
I have been working on NumRuby for 2 weeks now. I worked on adding indexing and iteration support. I started off with understanding how current indexing of 2-dimensional matrices is implemented. As I dig deep into the implementation, I saw that the code was written such that it can only work fro 2-d matrices. I designed a rough idea of how this should be to also support arbitrary-dimensional matrices. I also took inspiration from previous Nmatrix code and also went through numo-array code to get any useful insight.
Indexing
During the first week of GSoC coding period, I worked on adding indexing support for arbitrary dimensional matrices. Till now, NMatrix only supported 2-dimensional matrices and the code assumed NMatrix object to be of 2 dimensions. I first needed to modify code to store no. of dimensions and then write indexing conversions to map comma separated indexes to the index of corresponding element in the flat list of elements stored at back-end. This was done pretty easily with the help of strides which are the prefixes for each comma separated index value in the equation to calculate required index in flat list. Once the indexing part was done and working, I wrote the setters and getters for matrix elements. I also wrote some tests for indexing(setting and getting).
void get_stride(nmatrix* nmat, size_t* stride){
size_t val = 1;
for(int i = (nmat->ndims)-1; i >= 0; --i){ //using int here instead of size_t
stride[i] = val; //because size_t does not support
val *= nmat->shape[i]; //decrement operator
}
}
The above function get_stride
takes the matrix nmat
, calculates stride from nmat->shape[]
and store in stride[]
. Strides, in other words, are tuple of bytes to step in each dimension when traversing an array. More details on strides can be found here. Stride can simplify element location calculation a lot as one just need to multiply corresponding indices and stride value and add all products to get the location.
size_t get_index(nmatrix* nmat, VALUE* indices){
size_t index = 0;
size_t* stride = ALLOC_N(size_t, nmat->ndims);
get_stride(nmat, stride);
for(size_t i = 0; i < nmat->ndims; ++i){
if((size_t)FIX2LONG(indices[i]) >= nmat->shape[i] ||
(int)FIX2LONG(indices[i]) < 0) { //index out of bounds
rb_raise(rb_eIndexError, "IndexError: index is out of bounds.");
}
index += ((size_t)FIX2LONG(indices[i]) * stride[i]);
}
return index;
}
get_index
takes the matrix nmat
and indices
(input indices tuple) and uses stride from get_stride
to calculate the location of element in flat list of elements. It also check for any out of bounds index access and throws exception in case if any index is out of bounds.
Sample indexing usage
[1] pry(main)> n = NMatrix.new [2, 2, 2], [1, 2, 3, 4, 5, 6, -7, 0]
=>
[2] pry(main)> n[0, 0, 0] #getter using indexing
=> 1.0
[3] pry(main)> n[0, 0, 1]
=> 2.0
[4] pry(main)> n[1, 1, 1]
=> 0.0
[1] pry(main)> n = NMatrix.new [2, 2, 2], [1, 2, 3, 4, 5, 6, -7, 0]
=>
[2] pry(main)> n[0, 1, 0]
=> 3.0
[3] pry(main)> n[0, 1, 0] = 10 #setter using indexing
=> 10
[4] pry(main)> n[0, 1, 0]
=> 10.0
[5] pry(main)> n.elements #print all elements of n
=> [1.0, 2.0, 10.0, 4.0, 5.0, 6.0, -7.0, 0.0]
Iteration
The second week of GSoC was spent in implementing iterators for NMatrix. NMatrix#each
was easy to implement as it was simply yielding each element from the flat list of elements. It seemed to me that other iterators would also be as easy to implement but it wasn’t so as it took quite a lot of time to implement iterators with indexes. Iterators with indexes yield both the current element and the associated comma separated indexes with it. For increment of comma separated indexes after each iteration, I wrote function increment_index
which would increment current values to next one.
void increment_state(VALUE* state_array, VALUE* shape_array, size_t ndims) {
for (size_t index = ndims; index > 0; index--) {
int curr_dim_index = (int)NUM2INT(state_array[index]);
int curr_dim_length = (int)NUM2INT(shape_array[index - 1]);
if (curr_dim_index + 1 == curr_dim_length) {
curr_dim_index = 0;
state_array[index] = INT2NUM(curr_dim_index);
} else {
curr_dim_index++;
state_array[index] = INT2NUM(curr_dim_index);
break;
}
}
}
increment_state
takes the current state of indices tuple and increments it to next state. For example, a matrix nm
has shape [2, 2, 3]
, then indices tuple [0, 0, 0]
will be incremented to [0, 0, 1]
, [0, 0, 1]
to [0, 0, 2]
and that to [0, 1, 0]
and so on. [0, 1, 2]
to [1, 0, 0]
and it will increment [1, 1, 2]
to [0, 0, 0]
.
List of iterators implemented:
NMatrix#each
NMatrix#each_with_indices
NMatrix#each_row
NMatrix#each_column
Iterators yet to be implemented:
NMatrix#each_layer
NMatrix#each_rank
NMatrix#each_stored_with_indices
NMatrix#each_ordered_stored_with_indices
NMatrix#map_stored
Initially, I wrote each_row
and each_column
such that they would only work for 2-dimensional matrices. For each_row
, I would take NMatrix object and for each of the iteration, would calculate the index of each of it’s element in the flat list in background then store them in an array and yield this array wrapped as ruby array. For each_column
, the methodology was same with just traversing each column and yielding array having elements of each column.
//NMatrix#each_row for 2d-matrices
VALUE nm_each_row(VALUE self) {
nmatrix* input;
Data_Get_Struct(self, nmatrix, input);
VALUE* curr_row = ALLOC_N(VALUE, input->shape[1]);
for (size_t index = 0; index < input->shape[1]; index++){
curr_row[index] = INT2NUM(0);
}
double* elements = (double*)input->elements;
for (size_t row_index = 0; row_index < input->shape[0]; row_index++){
for (size_t index = 0; index < input->shape[1]; index++){
curr_row[index] = DBL2NUM(elements[(row_index*input->shape[1])+index]);
}
//rb_yield(DBL2NUM(elements[row_index]));
rb_yield(rb_ary_new4(input->shape[1], curr_row));
}
return self;
}
Sample #each and #each_with_indices iterators’ usage
[1] pry(main)> n = NMatrix.new [2, 2, 2], [1, 2, 3, 4, 5, 6, -7, 0]
=>
[2] pry(main)> n.each do |x| #iterate each element
[2] pry(main)* puts x
[2] pry(main)* end
1.0
2.0
3.0
4.0
5.0
6.0
-7.0
0.0
=>
[1] pry(main)> n = NMatrix.new [2, 2, 2], [1, 2, 3, 4, 5, 6, -7, 0]
=>
[2] pry(main)> n.each_with_indices do |x, i, j, k| #iterate each element with indices
[2] pry(main)* puts "#{x} with indices: #{i}, #{j}, #{k}"
[2] pry(main)* end
1.0 with indices: 0, 0, 0
2.0 with indices: 0, 0, 1
3.0 with indices: 0, 1, 0
4.0 with indices: 0, 1, 1
5.0 with indices: 1, 0, 0
6.0 with indices: 1, 0, 1
-7.0 with indices: 1, 1, 0
0.0 with indices: 1, 1, 1
=>
Later, I would expand these methods to work for N-dimensional matrices too. This would require implementing a generic method NMatrix#rank
which would take in the index of dimension along which traversal is done and index of the required part (slice) along that dimension as it’s argument. To traverse along each row, this would be used as rank(0, idx)
, where 0 denotes that we want part (slice) along the first dimension(i.e. row) and idx is the index of row to be returned. For example, rank(0, 0)
would return the first row, rank(0, 1)
would return the second row and so on. To traverse along each column, this would be used as rank(1, idx)
. So, in order to traverse along kth dimension, use rank(k, idx)
and yield the returned object for each of the iteration starting from rank(k, 0)
and incrementing k after each iteration. NMatrix#each_layer
would also work the same way with the use of rank(2, idx)
. This would be simplified by creating generic methods NMatrix#rank
and NMatrix#each_rank
and using these inside NMatrix#row
, NMatrix#each_row
, NMatrix#column
, NMatrix#each_column
, NMatrix#layer
and NMatrxi#each_layer
.
def rank(dimension_idx, rank_idx)
#calculate the corresponding sub-matrix
#for given dimension_idx and rank_idx
#using slicing and return the NMatrix object
#of that slice
end
def each_rank(dimension_idx)
(0...self.shape[dimension_idx]).each do |idx|
yield self.rank(dimension_idx, idx)
end
end
def row(row_idx)
self.rank(0, row_idx)
end
def column(column_idx)
self.rank(1, column_idx)
end
def layer(layer_idx)
self.rank(2, layer_idx)
end
def each_row()
self.each_rank(0)
end
def each_column()
self.each_rank(1)
end
def each_layer()
self.each_rank(2)
end
NMatrix#rank
would internally work by doing slicing of the given NMatrix object with only one index (idx in the examples above) along the given dimension and full ranges along all other dimensions. In other words, for a matrix nm
having 4 dimensions, nm.rank(0, 3)
would return a slice nm[3, :, :, :]
which would be a matrix having 3 dimensions. So, next step is to implement slicing which is quite a task in it’s own.
As indexing takes in comma separated indexes and returns a value on that location, slicing takes in comma separated ranges and returns the collection of elements in the given range and wraps this collection in an NMatrix object with corresponding shape and other NMatrix metadata. The dimensions of returned slice is determined by the number of ranges having length greater than one and length of each dimension would be determined by the length of each range given or if whole range is taken, then length of that dimension in the given matrix. For example, for a matrix nm[4, 4, 4, 4, 4]
, nm[0:2, :, 2, :, :]
would return a slice of shape [3, 4, 4, 4]
. The ranges for slicing can also be represented by Range
objects, like nm[0..1, 0...2, 0..]
and to understand how to specify ranges using Range
objects, one can refer to Ruby docs on Range
.
Corresponding pull requests can be found here at https://github.com/SciRuby/numruby/pull/19.
For coming weeks, I’ll complete the above remaining iterators and start work on slicing and broadcasting. Feel free to share your thoughts in the comments section. Thanks for reading.