I have a data.frame that assigns ids to groups. In the simplest scenario each id is assigned to a different group:
df1 <- data.frame(group = c("a1","a2"),
id = c("i1","i2"),
stringsAsFactors = F)
In a second scenario all ids are assigned to one group:
df2 <- data.frame(group = c("a1","a1"),
id = c("i1","i2"),
stringsAsFactors = F)
And in the third scenario there’s ambiguous id to group assignment:
df3 <- data.frame(group = c("a1","a2","a2"),
id = c("i1","i1","i2"),
stringsAsFactors = F)
I’m looking for a function that would return a label "scenario1"/"scenario2"/"scenario3" given such a data.frame with the id and group columns, according to the scenarios above.
In other words, this function would return "scenario1" for df1, "scenario2" for df2, and "scenario3" for df3
Obviously this can be done with if statements but I’m hoping for something faster using dplyr/tidyverse or data.table
>Solution :
Here’s a function to check different conditions.
library(dplyr)
return_scenario <- function(df) {
tmp <- df %>% distinct(group, id)
case_when(
n_distinct(tmp$group) == 1 ~ 'scenario 2',
n_distinct(tmp$id) == nrow(tmp) ~ 'scenario 1',
TRUE ~ 'scenario 3')
}
return_scenario(df1)
#[1] "scenario 1"
return_scenario(df2)
#[1] "scenario 2"
return_scenario(df3)
#[1] "scenario 3"
If needed, this can also be translated in base R/data.table with their equivalent functions.