I have the following dataframe:
tibble(year = c("2020Q1", "2020Q2", "2020Q3", "2020Q4", "2020END","2020Q1", "2020Q2", "2020Q3", "2020Q4", "2020Q1","2020Q2", "2020Q3", "2020Q4", "2020END"),
website = c("google","google","google","google","google", "facebook","facebook","facebook","facebook","youtube","youtube","youtube","youtube", "youtube"),
something = c(1,2,3,4,5,6,7,8,9,10,11,12,13, 14)
)
How can i make such a condition so that:
When Q4 and END are mentioned for the same website, then to take only the END values – but this is not the case only take Q4 values?
Desired output is the following table:
tibble(year = c("2020Q1", "2020Q2", "2020Q3","2020END","2020Q1", "2020Q2", "2020Q3", "2020Q4", "2020Q1","2020Q2", "2020Q3", "2020END"),
website = c("google","google","google","google", "facebook","facebook","facebook","facebook","youtube","youtube","youtube","youtube"),
something = c(1,2,3,5,6,7,8,9,10,11,12, 14)
)
I have tried using filter() but i assume there is a better condition, maybe pivoting? Assume I am not working with dates in the year column – as it really is only the string i am searching for Q4 and END to occur
>Solution :
An idea,
library(tidyverse)
df %>%
group_by(website) %>%
mutate(new = sum(grepl('Q4|END', year)) == 2 & grepl('Q4', year)) %>%
filter(!new) %>%
select(-new)
# A tibble: 12 x 3
# Groups: website [3]
year website something
<chr> <chr> <dbl>
1 2020Q1 google 1
2 2020Q2 google 2
3 2020Q3 google 3
4 2020END google 5
5 2020Q1 facebook 6
6 2020Q2 facebook 7
7 2020Q3 facebook 8
8 2020Q4 facebook 9
9 2020Q1 youtube 10
10 2020Q2 youtube 11
11 2020Q3 youtube 12
12 2020END youtube 14