Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I cross check a data frame if all possible combinations on it exist in another data frame of reference in R using dplyr?

I have two data frames.

The first one that contain all the possible combinations with their corresponding value and looks like this :

first second val
Alpha Beta 10
Alpha Corn 20
Alpha Desk 30
Beta Corn 40
Betea Desk 50
Corn Desk 60
Hat Ian 70

The second one that comes from the production line has two columns the date column that has grouped all the variables corresponding to their date and are concatenated :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

date var
2022-01-01 A
2022-02-01 Beta,Corn,Fanta,Epsilon,George,Hat,Ian

I want to find all the combinations in the second data frame and to see if they match with any combinations in the first data frame.If a variable stands alone in the second data frame as Alpha in 2022-01-01 to give me the 0 and otherwise the value of the combination.

Ideally I want the resulted data frame to look like this :

date comb val
2022-01-01 Alpha 0
2022-02-01 Beta,Corn 40
2022-02-01 Hat,Ian 70

How can I do this in R using dplyr ?



library(tidyverse)
first = c("Alpha","Alpha","Alpha","Beta","Beta","Corn","Hat")
second = c("Beta","Corn","Desk","Corn","Desk","Desk","Ian")
val = c(10,20,30,40,50,60,70)
df1 = tibble(first,second,val);df1
date = c(as.Date("2022-01-01"),as.Date("2022-02-01"))
var = c("Alpha","Beta,Corn,Fanta,Epsilon,George,Hat,Ian")
df2  = tibble(date,var);df2 



>Solution :

library(dplyr)
library(tidyr)
df2 %>% 
  separate_rows(var) %>%
  group_by(date) %>% 
  summarise(var = if(length(var) > 1) list(combn(var, 2,  \(x) 
   tibble(first = x[1], second = x[2]), simplify = FALSE) %>% 
    bind_rows) else 
  list(tibble(first = var, second = var)) ) %>% 
 unnest(var) %>%
 inner_join(df1, by = c("first", "second")) %>% 
 full_join(df2, by = "date") %>% 
 mutate(first = coalesce(first, var)) %>%
  unite(combn, first, second, sep = ", ") %>% 
 select(-var)

-output

# A tibble: 3 × 3
  date       combn        val
  <date>     <chr>      <dbl>
1 2022-02-01 Beta, Corn    40
2 2022-02-01 Hat, Ian      70
3 2022-01-01 Alpha, NA     NA
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading