I would like to create a dataset that has non intersecting polygons and their attributes as well as the attributes of both/all polygons when they do intersect.
to demo the problem I create some made up data
set.seed(131)
library(sf)
#create example data
m = rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0))
p = st_polygon(list(m))
n = 20
l = vector("list", n)
for (i in 1:n)
l[[i]] = p + 10 * runif(2)
s = st_sfc(l)
plot(s, col = sf.colors(categorical = TRUE, alpha = .5))
title("overlapping squares")
#convert to sf object
s <- st_as_sf(s)
#add some random info
s$tree_type <- rep(c("oak", "pine", "fir"), length.out=nrow(s))
And the data looks like this:
Simple feature collection with 20 features and 1 field
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 0.3840939 ymin: 1.132567 xmax: 10.19807 ymax: 10.14945
CRS: NA
First 10 features:
x tree_type
1 POLYGON ((2.06437 1.249422,... oak
2 POLYGON ((2.932732 3.757797... pine
3 POLYGON ((8.463468 5.292048... fir
4 POLYGON ((5.186254 2.378545... oak
5 POLYGON ((3.263181 9.149452... pine
6 POLYGON ((4.162483 3.446414... fir
7 POLYGON ((8.651544 3.160404... oak
8 POLYGON ((5.231186 2.916041... pine
9 POLYGON ((0.3840939 6.87075... fir
10 POLYGON ((7.742229 4.721718... oak
Then I do a self intersection like so
s_intersection <- st_intersection(s)
which creates data like this
Simple feature collection with 31 features and 3 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 0.3840939 ymin: 1.132567 xmax: 10.19807 ymax: 10.14945
CRS: NA
First 10 features:
tree_type n.overlaps origins x
1 oak 1 1 POLYGON ((2.06437 1.249422,...
2 pine 1 2 POLYGON ((3.932732 4.757797...
3 fir 1 3 POLYGON ((8.463468 6.292048...
4 oak 1 4 POLYGON ((6.186254 2.378545...
5 pine 1 5 POLYGON ((3.263181 9.149452...
6 fir 1 6 POLYGON ((4.162483 3.446414...
7 oak 1 7 POLYGON ((9.651544 3.160404...
4.1 oak 2 4, 8 POLYGON ((6.186254 2.916041...
8 pine 1 8 MULTIPOLYGON (((5.826989 3....
9 fir 1 9 POLYGON ((0.3840939 6.87075...
What I want to do is replace the column origins with attributes from both polygons. So for example polygon 4.1 would become oak, pine. Or it is fine if it is in two columns. The way I am thinking about approaching this problem is to unlist observations in the origins column so that each value is in its own column, then bind the attributes of the polygons to each value in the multiple origins columns. But is there an easier way?
>Solution :
one possibility, using {dplyr} for convenience:
library(dplyr)
s_intersection |>
rowwise() |>
mutate(tree_type = list(s$tree_type[origins])) |>
## only if you want a string instead of a list of tree types:
mutate(tree_type = paste(tree_type, collapse = ', '))
# A tibble: 31 x 4
# Rowwise:
tree_type n.overlaps origins x
* <chr> <int> <list> <GEOMETRY>
## ...
7 oak 1 <int [1]> POLYGON ((9.651544 3.160404, 8.651544 3.16040~
8 oak, pine 2 <int [2]> POLYGON ((6.186254 2.916041, 5.231186 2.91604~
9 pine 1 <int [1]> MULTIPOLYGON (((5.826989 3.916041, 5.826989 3~
### ...
Or, if you want to replicate polygons into one row per tree type, separate_rows from {tidyr} might be handy.
