GSoC 2019 | Week 5, 6


Another 2 weeks have passed working on NumRuby. I worked on adding broadcasting support and integrating broadcasting to elementwise operations. I also worked on fixing compiler errors which were mostly occurring due to some missed switch cases and some due to unused variables. There were also some minor issues that I encountered while fixing the warning, I also fixed some those. For broadcasting, I went through NumPy blogs to understand how broadcasting has been implemented in NumPy. These blogs have explained broadcasting quite nicely. I have added links to these blogs at the end of this blog, in case if anyone wants to give them a read.


What is broadcasting?

Broadcasting is the process of repeating a matrix along some dimensions to make it compatible for elementwise operations with some other matrix. Broadcasting is a means to do elementwise arithmetic operations when the matrices are of different shapes. The smaller matrix is “broadcast” across the larger matrix so that they have compatible shapes. In order to broadcast, the size of the trailing dimension for both matrices in an operation must either be the same size or one of them must be one. Constraints for broadcasting to happen:

  • If input matrices do not have the same number of dimensions, a “1” will be repeatedly prepended to the shapes of the smaller matrix until both the matrices have the same number of dimensions.
  • The lengths should be same for both matrices along a particular dimension or one of them should be “1”. Broadcasting ensures that matrix with a size of “1” along a particular dimension act as if they had the size of the matrix with the largest shape along that dimension. The value of the matrix elements is assumed to be the same along that dimension for the “broadcast” matrix.

For example:

A (3d matrix):       256 x 256 x 3
B (2d matrix):             256 x 3
Result (3d matrix):  256 x 256 x 3
A (3d matrix):      2 x 5 x 7 x 1
B (1d matrix):          5 x 1 x 8
Result (3d matrix): 2 x 5 x 7 x 8

Following examples are incompatible for broadcasting:

A (2d matrix): 3 x 4
B (2d matrix): 4 x 4
A (2d matrix):     2 x 1
B (3d matrix): 8 x 4 x 3

Rules of Broadcasting

Broadcasting in NumRuby follows a strict set of rules to determine the interaction between the two matrices:

  • Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
  • Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
  • Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

Implementation of broadcasting

Broadcasting will only be done for elementwise operators and when these operators are applied on two matrices with different shapes, the #broadcast method will be called and this method will first check the shapes for compatibility of broadcasting. This will be done by comparing each dimension’s length starting from last ones. Then memory will be allocated for new broadcasted matrices and then calculate the resultant broadcasted matrix from these two.

Below are the implementation snippets for broadcasting:

void broadcast_matrices(nmatrix* nmat1, nmatrix* nmat2) {
  size_t* broadcast_shape;
  size_t broadcast_dims;

  //check for broadcasting compatibilty
  //and raise error if incompatible

  get_broadcast_shape(nmat1, nmat2, broadcast_shape, broadcast_dims);

  broadcast_matrix(nmat1, broadcast_shape, broadcast_dims);
  broadcast_matrix(nmat2, broadcast_shape, broadcast_dims);

broadcast_matrices function takes the matrices nmat1, nmat2 and first gets the shape these 2 will get broadcasted to and if these are incompatible for broadcasting, raises an error. Currently the matrices are itself updated but later the implementation will be changed such that another copy will be created which will have broadcasted matrix and original one being unchanged.

void get_broadcast_shape(nmatrix* nmat1, nmatrix* nmat2, size_t* broadcast_shape, size_t broadcast_dims) {
  size_t* shape1 = nmat1->shape;
  size_t* shape2 = nmat2->shape;

  size_t ndims1 = nmat1->ndims;
  size_t ndims2 = nmat2->ndims;

  broadcast_dims = max(ndims1, ndims2);
  broadcast_shape = ALLOC_N(size_t, broadcast_dims);

  if(ndims1 > ndims2) {
    for(size_t i = 0; i < ndims1; ++i) {
      broadcast_shape[i] = shape1[i];
    for(size_t i = 0; i < ndims2; ++i) {
      size_t res_index = (ndims1 - ndims2) + i;
      if(shape1[res_index] != shape2[i] && min(shape1[res_index], shape2[i]) > 1) {
        //raise broadcast compatibility error
      broadcast_shape[res_index] = max(shape1[res_index], shape2[i]);
  else {
    for(size_t i = 0; i < ndims2; ++i) {
      broadcast_shape[i] = shape2[i];
    for(size_t i = 0; i < ndims1; ++i) {
      size_t res_index = (ndims2 - ndims1) + i;
      if(shape1[i] != shape2[res_index] && min(shape1[i], shape2[res_index]) > 1) {
        //raise broadcast compatibility error
      broadcast_shape[res_index] = max(shape1[i], shape2[res_index]);

get_broadcast_shape calculates the broadcast shape for matrices nmat1 and nmat2 and stores it in broadcast_shape and it’s corresponding length in broadcast_dims. It raises an error if it finds that the matrices are incompatible for broadcasting.

void broadcast_matrix(nmatrix* nmat, size_t* new_shape, size_t new_ndims) {
  size_t prev_ndims = nmat->ndims;
  size_t* prev_shape = nmat->shape;

  nmat->ndims = new_ndims;
  nmat->shape = new_shape;

  size_t new_count = 1;
  for(size_t i = 0; i < new_ndims; ++i) {
    new_count *= new_shape[i];
  nmat->count = new_count;

  VALUE* state_array = ALLOC_N(VALUE, new_ndims);
  for(size_t i = 0; i < new_ndims; ++i) {
    state_array[i] = SIZET2NUM(0);

  double* nmat_elements = (double*)nmat->elements;
  double* new_elements = ALLOC_N(double, new_count);

  for(size_t i = 0; i < new_count; ++i){
    size_t nmat_index = get_index_for_broadcast_element(prev_shape, prev_ndims, state_array, new_ndims);
    new_elements[i] = nmat_elements[nmat_index];

    size_t state_index = (nmat->ndims) - 1;
      size_t curr_index_value = NUM2SIZET(state_array[state_index]);

      if(curr_index_value == new_shape[state_index]){
        curr_index_value = 0;
        state_array[state_index] = SIZET2NUM(curr_index_value);
        state_array[state_index] = SIZET2NUM(curr_index_value);


  nmat->elements = new_elements;

Above function broadcast_matrix actually does the broadcasting of matrix. It makes use of new_shape and new_ndims (calculated by get_broadcast_shape) and updates the nmat data members, namely, nmat->count, nmat->shape, nmat->ndims and nmat->elements.

size_t get_index_for_broadcast_element(size_t* prev_shape, size_t prev_ndims, size_t* state_array, size_t new_dims) {
  size_t* indices = ALLOC_N(size_t, prev_ndims);
  for(size_t i = (new_dims - prev_ndims), index = 0; i < new_dims; ++i, ++index) {
    indices[index] = max(state_array[i], prev_shape[index] - 1);

  size_t new_index = 0;
  size_t* stride = ALLOC_N(size_t, prev_ndims);
  size_t val = 1;
  for(size_t i = prev_ndims; i > 0; --i) {
    stride[i - 1] = val;
    val *= prev_shape[i - 1];

  for(size_t i = 0; i < prev_ndims; ++i) {
    new_index += ((size_t)FIX2LONG(indices[i]) * stride[i]);
  return new_index;

When broadcast_matrix function fills up the elements in the broadcasted matrix, it finds the corresponding element in the original matrix which is to be placed in a particular location and to the index of element in original matrix, get_index_for_broadcast_element is used.

These above functions are called internally before doing any elementwise operation. There are also some methods which can be directly called from the NumRuby ruby module which are NumRuby::broadcast_to and NumRuby::broadcast_arrays.

VALUE nm_broadcast_to(VALUE self, VALUE new_shape) {
  nmatrix* nmat;
  Data_Get_Struct(self, nmatrix, nmat);

  size_t new_ndims = (size_t)RARRAY_LEN(new_shape);

  size_t* new_shape = ALLOC_N(size_t, new_ndims);
  for (size_t index = 0; index < new_ndims; index++) {
    new_shape[index] = (size_t)FIX2LONG(RARRAY_AREF(new_shape, index));

  broadcast_matrix(nmat, new_shape, new_ndims);

  return Data_Wrap_Struct(NMatrix, NULL, nm_free, nmat);

nm_broadcast_to can be called from broadcast_to(nm, new_shape) where nm is an NMatrix object and new_shape is a one-dimensional array denoting the shape matrix is to be broadcasted to.


Corresponding pull requests can be found here at and

For coming weeks, I’ll work on implementing wrapper functions in C for LAPACK functionalities. Feel free to share your thoughts in the comments section. Thanks for reading.