GSoC 2019 | Week 7, 8

Introduction

During these weeks, I started work on Linear algebra functionalities of NumRuby. This is being done by implementing Linear algebra methods under the module Numruby::Linalg and these methods then makes use of LAPACK and BLAS routines but these can’t be directly called from Ruby code, so there needs to be some wrappers functions in C which expose these functions to be able to be called from Ruby code. I implemented most of the required wrapper functions and will later use them to implement NumRuby::Linalg methods. I went through source code of numpy.linalg and Numo::Linalg to have a gist of ways the Linear algebra capabilities could be implemented.

BLAS and LAPACK

LAPACK and BLAS are libraries for linear algebra functionalities written in FORTRAN and are used as underlying for linear algebra capabilities in nearly every numerical library in modern languages be int NMatrix, Numo in ruby or NumPy in Python and the reason primarily being their performance and efficiency.

Wikipedia defines BLAS as following, “Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication. They are the de facto standard low-level routines for linear algebra libraries; the routines have bindings for both C and Fortran. Although the BLAS specification is general, BLAS implementations are often optimized for speed on a particular machine, so using them can bring substantial performance benefits. BLAS implementations will take advantage of special floating point hardware such as vector registers or SIMD instructions.” an defines LAPACK as following, “LAPACK (Linear Algebra Package) is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LU, QR, Cholesky and Schur decomposition. LAPACK was originally written in FORTRAN 77, but moved to Fortran 90 in version 3.2 (2008). The routines handle both real and complex matrices in both single and double precision.”. One can explore BLAS and LAPACK here at http://www.netlib.org/blas/ and http://www.netlib.org/lapack/.

NumRuby::Linalg

NumRuby::Linalg is the module of NumRuby which provides linear algebra functionalities. It provides methods for matrix decomposition, inverse, dot product, matrix norm & determinant and linear equations solution. My initial approach to implementing these methods was to write the module completely in C as I thought it would be the most optimal in terms of speed if implemented this way. But, there were some problems with this approach which I later realized when I started implementing QR decomposition. The main problem was that it took too long to implement just QR decomposition as many small things done Ruby using just an expression or a statement took multiple lines to be implemented. For example, declaring and initializing an NMatrix object can be done using a simple statement but to do the same thing in C, one has to allocate memory for shape, elements and initialize all parameters manually.

The way Linalg has been implemented in most libraries is that the methods have been written in the same language (Ruby in our case) and the LAPACK routines are directly called from these methods. The problem here is LAPACK routines can’t be directly called from Ruby as there aren’t any bindings of LAPACK for Ruby but there are libraries in C which lets one invoke LAPACK routines through C code. This way is much better in terms of complexity of code and there was hardly any performance difference. So, now the task was to write functions in C which would act as interface between LAPACK and Linalg methods in Ruby.

Wrappers for LAPACK routines

To implement the wrappers functions in C, I listed out all the LAPACK routines to be used in currently planned Linalg module. Then forward declared wrapper functions for all these routines in ruby_nmatrix.c. I also implemented constructors for nmatrix struct, namely, nmatrix_new and nmatrix_copy. These constructors reduced code repetition in wrappers functions as now any new nmatrix instance can be declared and initialized in one line and copy of nmatrix instance can be done in one line.

nmatrix* nmatrix_new(
  nm_dtype dtype,
  nm_stype stype,
  size_t ndims,
  size_t count,
  size_t* shape,
  void* elements
  ) {
  nmatrix* matrix = ALLOC(nmatrix);
  matrix->dtype = dtype;
  matrix->stype = stype;
  matrix->ndims = ndims;
  matrix->count = count;

  matrix->shape = ALLOC_N(size_t, matrix->ndims);
  if(shape != NULL) {
    for(size_t i = 0; i < ndims; ++i) {
      matrix->shape[i] = shape[i];
    }
  }

  if(elements == NULL) {
    return matrix;
  }

  double* temp_elements = (double*)elements;
  double* matrix_elements = ALLOC_N(double, matrix->count);
  for(size_t i = 0; i < count; ++i) {
    matrix_elements[i] = temp_elements[i];
  }
  matrix->elements = matrix_elements;

  return matrix;
}
nmatrix* matrix_copy(nmatrix* original_matrix) {
  nmatrix* matrix = ALLOC(nmatrix);
  matrix->dtype = original_matrix->dtype;
  matrix->stype = original_matrix->stype;
  matrix->ndims = original_matrix->ndims;
  matrix->count = original_matrix->count;

  matrix->shape = ALLOC_N(size_t, matrix->ndims);
  if(original_matrix->shape != NULL) {
    for(size_t i = 0; i < original_matrix->ndims; ++i) {
      matrix->shape[i] = original_matrix->shape[i];
    }
  }

  if(original_matrix->elements == NULL) {
    return matrix;
  }

  double* temp_elements = (double*)original_matrix->elements;
  double* matrix_elements = ALLOC_N(double, matrix->count);
  for(size_t i = 0; i < original_matrix->count; ++i) {
    matrix_elements[i] = temp_elements[i];
  }
  matrix->elements = matrix_elements;

  return matrix;
}

Example invocation for these constructors is nmatrix* result = nmatrix_new(matrix_qr->dtype, matrix_qr->stype, 2, matrix_qr->count, matrix_qr->shape, NULL); and nmatrix* result = nmatrix_copy(matrix); respectively.

Each of the wrappers was written in lapack.c as follows:

VALUE nm_geqrf(int argc, VALUE* argv) {
  nmatrix* matrix;
  Data_Get_Struct(argv[0], nmatrix, matrix);

  int m = matrix->shape[0]; //no. of rows
  int n = matrix->shape[1]; //no. of cols
  int lda = n, info = -1;

  nmatrix* result_qr = nmatrix_new(matrix->dtype, matrix->stype, 2, matrix->count, matrix->shape, NULL);
  nmatrix* result_tau = nmatrix_new(matrix->dtype, matrix->stype, 1, min(m, n), NULL, NULL);
  result_tau->shape[0] = min(m, n);

  switch(matrix->dtype) {
    case nm_bool:
    {
      //not supported error
      break;
    }
    case nm_int:
    {
      //not supported error
      break;
    }
    case nm_float32:
    {
      float* tau_elements = ALLOC_N(float, min(m, n));
      float* elements = ALLOC_N(float, matrix->count);
      memcpy(elements, matrix->elements, sizeof(float)*matrix->count);
      info = LAPACKE_sgeqrf(LAPACK_ROW_MAJOR, m, n, elements, lda, tau_elements);

      result_qr->elements = elements;
      result_tau->elements = tau_elements;

      VALUE qr = Data_Wrap_Struct(NMatrix, NULL, nm_free, result_qr);
      VALUE tau = Data_Wrap_Struct(NMatrix, NULL, nm_free, result_tau);
      return rb_ary_new3(2, qr, tau);
      break;
    }
    case nm_float64:
    {
      double* tau_elements = ALLOC_N(double, min(m, n));
      double* elements = ALLOC_N(double, matrix->count);
      memcpy(elements, matrix->elements, sizeof(double)*matrix->count);
      info = LAPACKE_dgeqrf(LAPACK_ROW_MAJOR, m, n, elements, lda, tau_elements);

      result_qr->elements = elements;
      result_tau->elements = tau_elements;

      VALUE qr = Data_Wrap_Struct(NMatrix, NULL, nm_free, result_qr);
      VALUE tau = Data_Wrap_Struct(NMatrix, NULL, nm_free, result_tau);
      return rb_ary_new3(2, qr, tau);
      break;
    }
    case nm_complex32:
    {
      float complex* tau_elements = ALLOC_N(float complex, min(m, n));
      float complex* elements = ALLOC_N(float complex, matrix->count);
      memcpy(elements, matrix->elements, sizeof(float complex)*matrix->count);
      info = LAPACKE_cgeqrf(LAPACK_ROW_MAJOR, m, n, elements, lda, tau_elements);

      result_qr->elements = elements;
      result_tau->elements = tau_elements;

      VALUE qr = Data_Wrap_Struct(NMatrix, NULL, nm_free, result_qr);
      VALUE tau = Data_Wrap_Struct(NMatrix, NULL, nm_free, result_tau);
      return rb_ary_new3(2, qr, tau);
      break;
    }
    case nm_complex64:
    {
      double complex* tau_elements = ALLOC_N(double complex, min(m, n));
      double complex* elements = ALLOC_N(double complex, matrix->count);
      memcpy(elements, matrix->elements, sizeof(double complex)*matrix->count);
      info = LAPACKE_zgeqrf(LAPACK_ROW_MAJOR, m, n, elements, lda, tau_elements);

      result_qr->elements = elements;
      result_tau->elements = tau_elements;

      VALUE qr = Data_Wrap_Struct(NMatrix, NULL, nm_free, result_qr);
      VALUE tau = Data_Wrap_Struct(NMatrix, NULL, nm_free, result_tau);
      return rb_ary_new3(2, qr, tau);
      break;
    }
  }
  return INT2NUM(-1);
}

Above function nm_geqrf is a wrapper for LAPACK routines sgeqrf, dgeqrf, cgeqrf and zgeqrf. This function takes the input matrix (or matrices) and relevant arguments, parses them, declares and initializes result matrices, uses switch statement to decide which LAPACK routine to call and then passes the required arguments to the routine call and finally returns the resulting matrix (or matrices) as Ruby objects. This wrapper can be called from Ruby code using NumRuby::Lapack.geqrf(matrix). Similarly, other wrapper functions have been implemented.

Corresponding pull request can be found here at https://github.com/SciRuby/numruby/pull/26.

After this, I’ll work on implementing linear algebra methods for NumRuby::Linalg using these wrappers for LAPACK routines. Feel free to share your thoughts in the comments section. Thanks for reading.