I have two data frames.
The first one that contain all the possible combinations with their corresponding value and looks like this :
| first | second | val |
|---|---|---|
| Alpha | Beta | 10 |
| Alpha | Corn | 20 |
| Alpha | Desk | 30 |
| Beta | Corn | 40 |
| Betea | Desk | 50 |
| Corn | Desk | 60 |
| Hat | Ian | 70 |
The second one that comes from the production line has two columns the date column that has grouped all the variables corresponding to their date and are concatenated :
| date | var |
|---|---|
| 2022-01-01 | A |
| 2022-02-01 | Beta,Corn,Fanta,Epsilon,George,Hat,Ian |
I want to find all the combinations in the second data frame and to see if they match with any combinations in the first data frame.If a variable stands alone in the second data frame as Alpha in 2022-01-01 to give me the 0 and otherwise the value of the combination.
Ideally I want the resulted data frame to look like this :
| date | comb | val |
|---|---|---|
| 2022-01-01 | Alpha | 0 |
| 2022-02-01 | Beta,Corn | 40 |
| 2022-02-01 | Hat,Ian | 70 |
How can I do this in R using dplyr ?
library(tidyverse)
first = c("Alpha","Alpha","Alpha","Beta","Beta","Corn","Hat")
second = c("Beta","Corn","Desk","Corn","Desk","Desk","Ian")
val = c(10,20,30,40,50,60,70)
df1 = tibble(first,second,val);df1
date = c(as.Date("2022-01-01"),as.Date("2022-02-01"))
var = c("Alpha","Beta,Corn,Fanta,Epsilon,George,Hat,Ian")
df2 = tibble(date,var);df2
>Solution :
library(dplyr)
library(tidyr)
df2 %>%
separate_rows(var) %>%
group_by(date) %>%
summarise(var = if(length(var) > 1) list(combn(var, 2, \(x)
tibble(first = x[1], second = x[2]), simplify = FALSE) %>%
bind_rows) else
list(tibble(first = var, second = var)) ) %>%
unnest(var) %>%
inner_join(df1, by = c("first", "second")) %>%
full_join(df2, by = "date") %>%
mutate(first = coalesce(first, var)) %>%
unite(combn, first, second, sep = ", ") %>%
select(-var)
-output
# A tibble: 3 × 3
date combn val
<date> <chr> <dbl>
1 2022-02-01 Beta, Corn 40
2 2022-02-01 Hat, Ian 70
3 2022-01-01 Alpha, NA NA