Based on the data below how can I get add a third Type colummn? The type of hospital will be determined based on certain words in the hospital names.
Word Type
Government Government
Govt Government
St Jude Religious
Catholic Religious
District District
Community Community
Divine Mercy Religious
St. Luke Religious
St. Theresa Religious
Islamic Religious
Babtist Religious
Data:
df = structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
Hospital = c("A Government Hospital", "Government B Hospital",
"C Govt Hospital", "D St Jude Hospital", "D Catholic Hospital",
"Catholic E Hospital", "F District Hospital", "G Community Hospital",
"H Divine Mercy Hospital", "I St. Luke Hospital", "J St. Theresa Hospital",
"Babtist Hospital")), class = "data.frame", row.names = c(NA,
-12L))
# Desired df
df_desired = Hospital = c("A Governtment Hospital", "Goverment B Hospital",
"C Govt Hospital", "D St Jude Hospital", "D Catholic Hospital",
"Catholic E Hospital", "F District Hospital", "G Community Hospital",
"H Divine Mercy Hospital", "I St. Luke Hospital", "J St. Theresa Hospital",
"Babtist Hospital"), Type = c("Government", "Government",
"Religious", "Religious", "Religious", "Religious", "District",
"Community", "Religious", "Religious", "Religious", "Religious"
)), class = "data.frame", row.names = c(NA, -12L))
>Solution :
If we have key/value dataset, can use regex_left_join from fuzzyjoin
library(fuzzyjoin)
library(dplyr)
regex_left_join(df, keydat, by = c("Hospital" = "Word")) %>%
select(-Word)
-output
id Hospital Type
1 1 A Governtment Hospital Government
2 2 Goverment B Hospital Government
3 3 C Govt Hospital Government
4 4 D St Jude Hospital Religious
5 5 D Catholic Hospital Religious
6 6 Catholic E Hospital Religious
7 7 F District Hospital District
8 8 G Community Hospital Community
9 9 H Divine Mercy Hospital Religious
10 10 I St. Luke Hospital Religious
11 11 J St. Theresa Hospital Religious
12 12 Babtist Hospital Religious
data
keydat <- structure(list(Word = c("Gover(nt)?ment", "Govt", "St Jude",
"Catholic", "District", "Community", "Divine Mercy", "St. Luke",
"St. Theresa", "Islamic", "Babtist"), Type = c("Government",
"Government", "Religious", "Religious", "District", "Community",
"Religious", "Religious", "Religious", "Religious", "Religious"
)), row.names = c(NA, -11L), class = "data.frame")