Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I find the distance between consecutive coordinates in R?

I have a dataframe similar in structure to the one created below:

id <- rep(c("a", "b", "c", "d"), each = 3)
date <- seq(as.Date("2019-01-30"), as.Date("2019-02-10"), by="days")
lon <- c(-87.1234, -86.54980, -86.234059, -87.2568, -87.65468, -86.54980, -86.234059, -86.16486, -87.156546, -86.234059, -86.16486, -87.156546)
lat <- c(26.458, 26.156, 25.468, 25.157, 24.154, 24.689, 25.575, 25.468, 25.157, 24.154, 26.789, 26.456)
data <- data.frame(id, date, lon, lat)
data <- data %>% arrange(id, date)

I would like to measure the distance between consecutive points grouped by id. I do not want a distance matrix, which is why I refrain from using raster::pointDistance. I tried separating each unique id into its own sf dataframe (in reality I have ~400 ids so I kind of have to separate for the actual calculation due to the size) and using the following code:

#put rows for each id in their own dataframes
un1 <- unique(data$id)
for(i in seq_along(un1)) 
  assign(paste0('id', i), subset(data, id == un1[i]))
#create point distance function
pt.dist <- function(dat){dat$pt.dist <- st_distance(dat, by_element=TRUE)
  return(dat)}
#run function across every dataframe in working environment
e <- .GlobalEnv
nms <- ls(pattern = "id", envir = e)
for(nm in nms) e[[nm]] <- pt.dist(e[[nm]])

When I run this, all I get is a geometry column with lon and lat listed in a pair. I have also tried segclust2d::calc_distance like below:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

distance <- function(dat){calc_dist(dat, coord.names = c("lon", "lat"), smoothed = FALSE)}
for(nm in nms) e[[nm]] <- distance(e[[nm]])

which returns a column where the distances are all 0 meters.

Any help would be greatly appreciated!

>Solution :

geosphere::dist* support this. The most-accurate is distVincentyEllipsoid (though it may be slower with larger data), followed by distVincentySphere and distHaversine. Its return value is in meters.

dplyr

library(dplyr)
data %>%
  group_by(id) %>%
  mutate(dist = c(NA, geosphere::distVincentyEllipsoid(cbind(lon, lat)))) %>%
  ungroup()
# # A tibble: 12 x 5
#    id    date         lon   lat    dist
#    <chr> <date>     <dbl> <dbl>   <dbl>
#  1 a     2019-01-30 -87.1  26.5     NA 
#  2 a     2019-01-31 -86.5  26.2  66334.
#  3 a     2019-02-01 -86.2  25.5  82534.
#  4 b     2019-02-02 -87.3  25.2     NA 
#  5 b     2019-02-03 -87.7  24.2 118175.
#  6 b     2019-02-04 -86.5  24.7 126758.
#  7 c     2019-02-05 -86.2  25.6     NA 
#  8 c     2019-02-06 -86.2  25.5  13744.
#  9 c     2019-02-07 -87.2  25.2 105632.
# 10 d     2019-02-08 -86.2  24.2     NA 
# 11 d     2019-02-09 -86.2  26.8 291988.
# 12 d     2019-02-10 -87.2  26.5 105423.

base R

We can get to the same thing with ave. Because it only iterates over a single column, we pass row-indices as the first argument to be grouped. Because it coerces the return values to be the same class as the first argument, we convert the row-indices to numeric.

data$dist2 <- ave(
  as.numeric(seq_len(nrow(data))),  # values to use in calc
  data$id,                          # grouping variable(s)
  FUN = function(i) c(NA, geosphere::distVincentyEllipsoid(data[i, c("lon", "lat")]))
)
data
#    id       date       lon    lat     dist2
# 1   a 2019-01-30 -87.12340 26.458        NA
# 2   a 2019-01-31 -86.54980 26.156  66334.13
# 3   a 2019-02-01 -86.23406 25.468  82534.47
# 4   b 2019-02-02 -87.25680 25.157        NA
# 5   b 2019-02-03 -87.65468 24.154 118175.40
# 6   b 2019-02-04 -86.54980 24.689 126757.93
# 7   c 2019-02-05 -86.23406 25.575        NA
# 8   c 2019-02-06 -86.16486 25.468  13743.74
# 9   c 2019-02-07 -87.15655 25.157 105631.82
# 10  d 2019-02-08 -86.23406 24.154        NA
# 11  d 2019-02-09 -86.16486 26.789 291988.42
# 12  d 2019-02-10 -87.15655 26.456 105422.87

Internally, the second call to the FUN function passed i=c(4,5,6) for the "b" id group. Those numbers do not need to be consecutive; in fact, one strength of ave over other group-processing functions is that it always returns in the same order as the input, so it is safe to reassign its value back to the original frame.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading