I am trying to add a column to my dataframe based on if a string is detected in another column. I have done this in two chunks of code and then merged them together, but I am trying to streamline my code so that there is less to type out in the future. I also noticed I performed a join incorrectly on a dataset I’ve been working with for months, so the fewers joins, the better.
Here is what currently works for me, but feels unnecessarily long.
dtc_final2022<- dtc_final1 %>%
filter (str_detect(detection_timestamp_utc, "2022")) %>%
mutate(Year = "2022")
dtc_final2021 <- dtc_final1 %>%
filter (str_detect(detection_timestamp_utc, "2021")) %>%
mutate(Year = "2021")
dtc_final2 <- full_join(dtc_final2021, dtc_final2022)
dtc_final1 is a dataset with timestamps from many years. I am only interested in adding a "year" to timestamps that contain 2021 and 2022. In the future, I will add 2023 and 2024.
This is what I would like to do, but in doing so, I replace the previous year with NA. Is there a way to run an ifelse function without the ‘else’? Also, please remember that I cant use the other year as the ‘else’ since in the future, I will have 4 years to deal with, and not just 2.
dtc_final2 <- dtc_final1 %>%
mutate(Year = ifelse(str_detect(detection_timestamp_utc, "2021"), "2021", NA),
Year = ifelse(str_detect(detection_timestamp_utc, "2022"), "2022", NA))
I try to do everyling in dplyr but if a for loop does the trick, then I guess I’ll buck up.
Thanks in advance!
>Solution :
You may use str_extract() rather than str_detect() here, and use a regular expression that captures both of the years of interest:
mutate(dtc_final1, Year=str_extract(detection_timestamp_utc, "^202[12]"))