Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Finding distance between a row and the row two above it in R

I would like to efficiently compute distances between every row in a matrix and the row two rows above it in R…

My attempts at finding a dplyr rowwise solution with lag(., n = 2) have failed, and I’m sure there’s a better solution than this for loop.

Thoughts are much appreciated!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(rdist)
library(tidyverse)

structure(list(sodium = c(140, 152.6, 138, 152.4, 140, 152.6, 
141, 152.7, 141, 152.7), chloride = c(103, 148.9, 104, 149, 102, 
148.8, 103, 148.9, 104, 149), potassium_plas = c(3.4, 0.34, 4.1, 
0.41, 3.7, 0.37, 4, 0.4, 3.7, 0.37), co2_totl = c(31, 3.1, 22, 
2.2, 23, 2.3, 27, 2.7, 20, 2), bun = c(11, 1.1, 5, 0.5, 8, 0.8, 
21, 2.1, 10, 1), creatinine = c(0.84, 0.084, 0.53, 0.053, 0.69, 
0.069, 1.04, 0.104, 1.86, 0.186), calcium = c(9.3, 0.93, 9.8, 
0.98, 9.4, 0.94, 9.4, 0.94, 9.1, 0.91), glucose = c(102, 10.2, 
99, 9.9, 115, 11.5, 94, 9.4, 122, 12.2), anion_gap = c(6, 0.599999999999989, 
12, 1.20000000000001, 15, 1.50000000000001, 11, 1.09999999999998, 
17, 1.69999999999999)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

dist_prior <- rep(NA, n = nrow(input_labs))

for(i in 3:nrow(input_labs)){
  dist_prior[i] <- cdist(input_labs[i,], input_labs[i-2,])
}

>Solution :

We could loop over the sequence of rows in map and apply the function, append NAs at the beginning to make the length correct

library(dplyr)
library(rdist)
library(purrr)
input_labs %>%
   mutate(dist_prior = c(NA_real_, NA_real_,
    map_dbl(3:n(), ~ cdist(cur_data()[.x,], cur_data()[.x-2, ]))))

-output

# A tibble: 10 × 10
   sodium chloride potassium_plas co2_totl   bun creatinine calcium glucose anion_gap dist_prior
    <dbl>    <dbl>          <dbl>    <dbl> <dbl>      <dbl>   <dbl>   <dbl>     <dbl>      <dbl>
 1   140      103            3.4      31    11        0.84     9.3    102       6          NA   
 2   153.     149.           0.34      3.1   1.1      0.084    0.93    10.2     0.600      NA   
 3   138      104            4.1      22     5        0.53     9.8     99      12          13.0 
 4   152.     149            0.41      2.2   0.5      0.053    0.98     9.9     1.20        1.30
 5   140      102            3.7      23     8        0.69     9.4    115      15          16.8 
 6   153.     149.           0.37      2.3   0.8      0.069    0.94    11.5     1.50        1.68
 7   141      103            4        27    21        1.04     9.4     94      11          25.4 
 8   153.     149.           0.4       2.7   2.1      0.104    0.94     9.4     1.10        2.54
 9   141      104            3.7      20    10        1.86     9.1    122      17          31.5 
10   153.     149            0.37      2     1        0.186    0.91    12.2     1.70        3.15

Or may split by row on the original data and the laged one and use map2 to loop over the list and apply

input_labs$dist_prior <- map2_dbl(
         asplit(lag(input_labs, n = 2), 1),
          asplit(input_labs, 1), 
         ~ cdist(as.data.frame.list(.x), as.data.frame.list(.y))[,1])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading