Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R: Using the MATCH function as a JOIN?

I am working with the R programming language.

I have the following map (shapefile):

library(sf)  
library(leaflet)

nc <- st_read(system.file("gpkg/nc.gpkg", package="sf"), quiet = TRUE) %>% 
  st_transform(st_crs(4326)) %>% 
  st_cast('POLYGON')

Now, suppose I have a dataset with information for different polygons within this map (I made some areas missing on purpose

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

set.seed(123)
unemployement_rate = rnorm(nrow(nc), 50,5)
n <- nrow(nc)
n_NA <- round(n * 0.1)
idx <- sample(n, n_NA)
unemployement_rate[idx] 

my_df = data.frame(nc$NAME, unemployement_rate)

My Question: Assume that both of the above files already exist.

I would like to bring in the unemployment rate into the "nc" file. I am trying to merge both of these files in such a way, such that the number of rows in "nc" will not change.

In the past, I used to use the MATCH function as suggested in a previous question (Merging a Shapefile and a dataframe). However, when I would do this, the NA’s would get removed.

Thus, I tried to solve this problem a different way:

names(my_df) <- c("NAME", "unemployement_rate")
nc_merged <- merge(nc, my_df, by = "NAME", all.x = TRUE)

# optional : replace the NA with 9999 
# nc_merged$unemployement_rate[is.na(nc_merged$unemployement_rate)] <- 9999

However, now there appears to be more rows in nc_merged compared to the original file:

> dim(nc)
[1] 108  15
> dim(my_df)
[1] 108   2

> dim(nc_merged)
[1] 128  16

Can someone please show me why this is happening and how I can fix this?

Thanks!

>Solution :

i misunderstood. you can just use the merge function without aggregating

library(sf)

# Read the shapefile
nc <- st_read(system.file("gpkg/nc.gpkg", package = "sf"), quiet = TRUE) %>%
  st_transform(st_crs(4326)) %>%
  st_cast("POLYGON")

# Generate the dataset with unemployment rate
set.seed(123)
unemployment_rate <- rnorm(nrow(nc), 50, 5)
n_NA <- round(nrow(nc) * 0.1)
idx <- sample(nrow(nc), n_NA)
unemployment_rate[idx] <- NA
my_df <- data.frame(NAME = nc$NAME, unemployment_rate)

# Merge the datasets by NAME
nc_merged <- merge(nc, my_df, by = "NAME", all.x = TRUE)

# View the dimensions of the merged dataset
dim(nc) # Original nc dataset
dim(nc_merged) # Merged dataset
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading