I’m reading some code that’s highly optimized for speed on arrays, and it’s using colSums
in place of apply
in several cases.
First: Can someone explain why this syntax is valid, please? It appears to my eye as if an argument is left empty. Rstudio also flags these lines as missing arguments. I even resorted to an AI chatbot, which incorrectly predicted the results and output dimensions when using colSums
this way.
Second: Does anyone have a mnemonic or thinking device to help translate mentally between these two equivalent calls? colSums
does not seem an intuitive way to handle arrays higher than two dimensions. I understand it’s an optimized method of summing an array along some dimension, it’s just hard to mentally parse.
Reprex:
A <- array(1:(2*3*4), dim=c(2,3,4))
A
colSums(A, ,2)
# equivalent apply statement
apply(A, 3, sum)
>Solution :
Since R functions parameter are evaulated lazily, it’s not a problem to have missing arguments unless you try to use them. For example this will run fine
foo <- function(a, b, c) {
a + c
}
foo(1, ,5)
# [1] 6
The na.rm
parameter isn’t evaluated in the R environment. If you look at the source of colSums
you’ll see it makes a call to .Internal
to it has slightly different evaluation rules there but the idea is basically the same. It’s using a default so it’s not evaluating the parameter.
I guess your second question is about the dim=
parameter. From the help page, it says
dims
integer: Which dimensions are regarded as ‘rows’ or ‘columns’ to sum over. For row*, the sum or mean is over dimensions dims+1, …; for col* it is over dimensions 1:dims.
So since you are using colSums
a dim of 2 means to sum over dimensions 1:2 which is like the complement of how you would specify it using apply