I am working with the R programming language.
I have the following map (shapefile):
library(sf)
library(leaflet)
nc <- st_read(system.file("gpkg/nc.gpkg", package="sf"), quiet = TRUE) %>%
st_transform(st_crs(4326)) %>%
st_cast('POLYGON')
Now, suppose I have a dataset with information for different polygons within this map (I made some areas missing on purpose
set.seed(123)
unemployement_rate = rnorm(nrow(nc), 50,5)
n <- nrow(nc)
n_NA <- round(n * 0.1)
idx <- sample(n, n_NA)
unemployement_rate[idx]
my_df = data.frame(nc$NAME, unemployement_rate)
My Question: Assume that both of the above files already exist.
I would like to bring in the unemployment rate into the "nc" file. I am trying to merge both of these files in such a way, such that the number of rows in "nc" will not change.
In the past, I used to use the MATCH function as suggested in a previous question (Merging a Shapefile and a dataframe). However, when I would do this, the NA’s would get removed.
Thus, I tried to solve this problem a different way:
names(my_df) <- c("NAME", "unemployement_rate")
nc_merged <- merge(nc, my_df, by = "NAME", all.x = TRUE)
# optional : replace the NA with 9999
# nc_merged$unemployement_rate[is.na(nc_merged$unemployement_rate)] <- 9999
However, now there appears to be more rows in nc_merged compared to the original file:
> dim(nc)
[1] 108 15
> dim(my_df)
[1] 108 2
> dim(nc_merged)
[1] 128 16
Can someone please show me why this is happening and how I can fix this?
Thanks!
>Solution :
i misunderstood. you can just use the merge function without aggregating
library(sf)
# Read the shapefile
nc <- st_read(system.file("gpkg/nc.gpkg", package = "sf"), quiet = TRUE) %>%
st_transform(st_crs(4326)) %>%
st_cast("POLYGON")
# Generate the dataset with unemployment rate
set.seed(123)
unemployment_rate <- rnorm(nrow(nc), 50, 5)
n_NA <- round(nrow(nc) * 0.1)
idx <- sample(nrow(nc), n_NA)
unemployment_rate[idx] <- NA
my_df <- data.frame(NAME = nc$NAME, unemployment_rate)
# Merge the datasets by NAME
nc_merged <- merge(nc, my_df, by = "NAME", all.x = TRUE)
# View the dimensions of the merged dataset
dim(nc) # Original nc dataset
dim(nc_merged) # Merged dataset