I have an R function that calculates the Hamming distance of two vectors:
Hamming = function(x,y){
get_dist = sum(x != y, na.rm=TRUE)
return(get_dist)
}
that I would like to apply to every row of two matrices M1, M2 without using a for loop. What I currently have (where L is the number of rows in M1 and M2) is the very time-consuming loop:
xdiff = c()
for(i in 1:L){
xdiff = c(xdiff, Hamming(M1[i,],M2[i,]))
}
I thought that this could be done by executing
mapply(Hamming, t(M1), t(M2))
(with the transpose because mapply works across columns), but this doesn’t generate a length L vector of Hamming distances for each row, so perhaps I’m misunderstanding what mapply is doing.
Is there a straightforward application of mapply or something else in the R apply family that would work?
>Solution :
If dim(M1) and dim(M2) are identical, then you can simply do:
rowSums(M1 != M2, na.rm = TRUE)
Your attempt with mapply didn’t work because m-by-n matrices are stored as m*n-length vectors, and mapply handles them as such. To accomplish this with mapply, you would need to split each matrix into a list of row vectors:
mapply(Hamming, asplit(M1, 1L), asplit(M2, 1L))