Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R – how to replace strings in a column based on 2 or more values

I am trying to replace a string for specific groups of customer ids/order dates. The example below may better illustrate my question. I have a dataframe:

customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
status <- c("review", "review", "review", "negative", "positive", "review", "review", "review")
df <- data.frame(customerid, orderdate, status)

I am trying to group by customerid and orderdate. Then in each group of customerid/orderddate, if there is a "positive" or "negative" in the status column, I want to replace "review" with "done". The replacement can be done in a new column or the same one. The result would be:

customerid <- c("A1", "A1", "A2", "A2", "A3", "A3", "A3", "A4")
orderdate <- c("2018-09-14", "2018-09-14", "2018-09-15", "2018-09-15", "2020-08-21", "2020-08-21","2020-08-21", "2018-08-10")
status <- c("review", "review", "review", "negative", "positive", "review", "review", "review")
new <- c("review", "review", "done", "negative", "positive", "done", "done", "review")
df <- data.frame(customerid, orderdate, status, new)

I tried using the dplyr package and group_by and filter functions but am not sure how to proceed from here.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df <- df %>% 
       group_by(customerid, orderdate) %>% 
       dplyr::filter(df$new == 'review' & df$new == 'positive') %>%   

Thank you!

>Solution :

If we want to replace the review to done only when there is a positve or negative value in status for that group, then after grouping, create a logical vector that checks whether there are any ‘positive’ or ‘negative’ %in% the ‘status’ and (&) the value is ‘review’, then replace with ‘done’ or else return the column values

library(dplyr)
df <- df %>% 
      group_by(customerid, orderdate) %>%
      mutate(new = case_when(any(c('positive', 'negative') %in% 
             status) & status == 'review' ~ 'done', TRUE ~ status)) %>% 
     ungroup

-output

df
# A tibble: 8 × 4
  customerid orderdate  status   new     
  <chr>      <chr>      <chr>    <chr>   
1 A1         2018-09-14 review   review  
2 A1         2018-09-14 review   review  
3 A2         2018-09-15 review   done    
4 A2         2018-09-15 negative negative
5 A3         2020-08-21 positive positive
6 A3         2020-08-21 review   done    
7 A3         2020-08-21 review   done    
8 A4         2018-08-10 review   review  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading