I have a set of 52 spatial points. I’m trying to separate these points into groups based on a distance of 11 km. i.e. take all the points and turn them into groups where points within a group are no more than 11km apart. I’d then like to assign each point to the group in the data frame in a new column.
I found a couple of posts but they haven’t got me quite where I want as they require setting the number of clusters (which doesn’t matter). I tried adapting this post but it gave me 52 different groups which doesn’t seem right as there are many points that are within 11km of each other, so looking for new options.
structure(list(station = c("BE01", "BE02", "BEUWM01", "BL01",
"BL02", "PB01", "PB02", "PB03", "PB04", "PB05", "PB06", "PB07",
"PB09", "PB10", "PB11", "PB12", "PB13", "PB14", "PB15", "PB16",
"PB17", "PB18", "PB19", "PB20", "PB21", "PB22", "PB23", "PB24",
"PB25", "PB26", "PB27", "PB28", "PB29", "PB30", "PB4G01", "PB4G02",
"PBUWM01", "PBUWM02", "SA01", "SA02", "SA02b", "SA03", "SA04",
"SA05", "SA06", "SA07", "SA11", "SAUWM01", "VB01", "VB02", "VB03",
"VB04"), longitude = c(71.6546833333333, 71.6748333333333, 71.66293,
72.4337833333333, 72.4347, 71.7342, 71.7632, 71.7992, 71.8092,
71.8326916667, 71.8405, 71.8796, 71.96835, 71.9697666666667,
71.9727, 71.9745666666667, 71.9385075, 71.8685, 71.8524, 71.8414,
71.8294, 71.758275, 71.7578, 71.7468, 71.9733, 71.9795, 71.9741,
71.9209, 71.8959, 71.8228, 71.7498, 71.7323, 71.9068, 71.7474,
71.9398, 71.8329, 71.98115, 71.75197, 72.24793, 72.241, 72.23027,
72.2569, 72.2812, 72.1980666667, 72.2116, 72.221, 72.2636, 72.24754,
72.2155, 72.2405, 72.2156, 72.2488), latitude = c(-5.25671666666667,
-5.2662, -5.24915, -5.2579, -5.2432, -5.2815, -5.2459, -5.2461,
-5.2448, -5.23439583333, -5.2567, -5.2694, -5.24165, -5.33015,
-5.3344, -5.37878333333333, -5.39709575, -5.4271, -5.4229, -5.4308,
-5.4406, -5.456505, -5.3823, -5.3512, -5.2695, -5.3039, -5.3521,
-5.4126, -5.4243, -5.4644, -5.3957, -5.3181, -5.2668, -5.4261,
-5.2582, -5.445, -5.33995, -5.38898, -5.31183, -5.3056, -5.316745,
-5.2985, -5.3347, -5.35026666667, -5.3686, -5.3225, -5.3327,
-5.30114, -5.5452, -5.5207, -5.5247, -5.546)), row.names = c(NA,
-52L), class = "data.frame")
>Solution :
We can proceed as follows. First, convert your x/y co-ordinates to sf points:
library(sf)
points <- st_as_sf(df, coords = c("longitude", "latitude"), crs = 4326)
We can then get a 52 x 52 distance matrix giving us the distance between each pair of stations
adj <- st_distance(points)
Furthermore, we can turn this into a binary matrix telling us whether each pair of stations is within 11km of each other:
adj <- matrix(as.numeric(as.numeric(adj)) < 11000, nrow = nrow(adj))
Note that this is an adjacency matrix, so we can easily turn it into a graph:
library(igraph)
g <- graph_from_adjacency_matrix(adj)
We will see if we plot this graph, there are 4 connected components, representing clusters of stations within 11km of each other:
plot(g)

We can get the number of these components and put these back into our original data frame:
df$group <- factor(components(g)$membership)
This then correctly labels the stations within 11km of each other, as we can see from the resulting plot:
ggplot(rnaturalearth::ne_countries(scale = 10, returnclass = 'sf')) +
geom_sf() +
geom_point(data = df, aes(x = longitude, y = latitude, color = group)) +
coord_sf(xlim = c(71.5, 72.5), ylim = c(-6, -5))
Created on 2023-09-26 with reprex v2.0.2
