I want to create a new variable based on two conditions, for which the second condition may take two alternative values. Is there a way in the case_when function to combine "and" with "or" in one line of code?
Say this is the first dataframe:
df1 <- data.frame(company = c("A", "B", "C", "D"),
since = c(2014, 2017, 2014, 2013),
end = c(NA,NA,NA, 2014))
And this is the second:
df2 <- data.frame(company = c("A", "B", "C", "D",
"A", "B", "C", "D",
"A", "B", "C", "D"),
year = c(2013, 2013, 2013, 2013,
2014, 2014, 2014, 2014,
2015, 2015, 2015, 2015))
Now I want to combine the two and create a new variable based on conditions of the first dataframe. The new variable should get a score of 1 when since is the same or greater than year AND when end is either greater than year OR when end is NA.
The following code works:
df3 <- df2 %>%
left_join(df1, by = "company") %>%
mutate(new_var = case_when(year >= since & end > year ~ 1,
year >= since & is.na(end) ~ 1,
TRUE ~ 0)) %>%
select(!c(since, end))
And gives me:
company year new_var
1 A 2013 0
2 B 2013 0
3 C 2013 0
4 D 2013 1
5 A 2014 1
6 B 2014 0
7 C 2014 1
8 D 2014 0
9 A 2015 1
10 B 2015 0
11 C 2015 1
12 D 2015 0
But, ideally I would want to combine this into one line of code. Something like:
df3 <- df2 %>%
left_join(df1, by = "company") %>%
mutate(new_var = case_when(year >= since & end > year | is.na(end) ~ 1,
TRUE ~ 0)) %>%
select(!c(since, end))
But this gives me:
company year new_var
1 A 2013 1
2 B 2013 1
3 C 2013 1
4 D 2013 1
5 A 2014 1
6 B 2014 1
7 C 2014 1
8 D 2014 0
9 A 2015 1
10 B 2015 1
11 C 2015 1
12 D 2015 0
So is there a way to combine this into the same line of code? Any help appreciated.
>Solution :
Wrap OR with brackets:
df2 %>%
left_join(df1, by = "company") %>%
mutate(new_var = case_when(year >= since & (end > year | is.na(end)) ~ 1,
TRUE ~ 0)) %>%
select(!c(since, end)))
# company year new_var
# 1 A 2013 0
# 2 B 2013 0
# 3 C 2013 0
# 4 D 2013 1
# 5 A 2014 1
# 6 B 2014 0
# 7 C 2014 1
# 8 D 2014 0
# 9 A 2015 1
# 10 B 2015 0
# 11 C 2015 1
# 12 D 2015 0