Why is `colSums(A, ,2)` valid syntax? – looks like an empty argument

May 16, 2023

I’m reading some code that’s highly optimized for speed on arrays, and it’s using colSums in place of apply in several cases.

First: Can someone explain why this syntax is valid, please? It appears to my eye as if an argument is left empty. Rstudio also flags these lines as missing arguments. I even resorted to an AI chatbot, which incorrectly predicted the results and output dimensions when using colSums this way.

Second: Does anyone have a mnemonic or thinking device to help translate mentally between these two equivalent calls? colSums does not seem an intuitive way to handle arrays higher than two dimensions. I understand it’s an optimized method of summing an array along some dimension, it’s just hard to mentally parse.

Reprex:

A <- array(1:(2*3*4), dim=c(2,3,4))
A
colSums(A, ,2) 
# equivalent apply statement 
apply(A, 3, sum)

>Solution :

Since R functions parameter are evaulated lazily, it’s not a problem to have missing arguments unless you try to use them. For example this will run fine

foo <- function(a, b, c) {
  a + c
}

foo(1, ,5)
# [1] 6

The na.rm parameter isn’t evaluated in the R environment. If you look at the source of colSums you’ll see it makes a call to .Internal to it has slightly different evaluation rules there but the idea is basically the same. It’s using a default so it’s not evaluating the parameter.

I guess your second question is about the dim= parameter. From the help page, it says

dims integer: Which dimensions are regarded as ‘rows’ or ‘columns’ to sum over. For row*, the sum or mean is over dimensions dims+1, …; for col* it is over dimensions 1:dims.

So since you are using colSums a dim of 2 means to sum over dimensions 1:2 which is like the complement of how you would specify it using apply