Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to replace values by multiple grouping conditions?

In the following simplified example, you can see an id column (e.g. Product ID), a Date column containing actual and future dates, a value column and a column, indicating if it is an actual value, or a forecasted value using ML.

My goal is to replace the NA’s for each Model, with the last value and date of the ACT column of each id. In my example this would mean for id A1, to replace the NA’s in ML1 and ML2 with 2014-01-01 as Date and 54 as Value.

library(tidyverse)

df <- tibble(id = c(rep("A1",11), rep("B1",11)),
             Model = rep(c(rep("ACT",5), rep("ML1",3), rep("ML2",3)),2),
             Date = as.Date(rep(c("2010-01-01","2011-01-01","2012-01-01","2013-01-01",
                             "2014-01-01",NA, "2015-01-01","2016-01-01",
                             NA, "2015-01-01","2016-01-01"),2)),
             Value = c(c(11,31,44,21,54,NA,53,13,NA,33,12),
                     c(54,41,32,65,76,NA,32,42,NA,23,76))
             )

I am searching for a pipeable solution like dplyr with no for-loops.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

How about this:

library(tidyverse)
#> Warning: package 'tidyr' was built under R version 4.1.2
#> Warning: package 'readr' was built under R version 4.1.2

df <- tibble(id = c(rep("A1",11), rep("B1",11)),
             Model = rep(c(rep("ACT",5), rep("ML1",3), rep("ML2",3)),2),
             Date = as.Date(rep(c("2010-01-01","2011-01-01","2012-01-01","2013-01-01",
                                  "2014-01-01",NA, "2015-01-01","2016-01-01",
                                  NA, "2015-01-01","2016-01-01"),2)),
             Value = c(c(11,31,44,21,54,NA,53,13,NA,33,12),
                       c(54,41,32,65,76,NA,32,42,NA,23,76))
)
df %>% 
group_by(id) %>% 
filter(Model == "ACT") %>% 
summarise(across(c(Date, Value), last)) %>% 
rename(date_fill = Date, value_fill = Value) %>% 
right_join(df) %>% 
mutate(Value = case_when(Model != "Act" & is.na(Value) ~ value_fill, TRUE  ~ Value), 
       Date = case_when(Model != "Act" & is.na(Date) ~ date_fill, TRUE  ~ Date)) %>% 
select(-c("date_fill", "value_fill"))
#> Joining, by = "id"
#> # A tibble: 22 × 4
#>    id    Model Date       Value
#>    <chr> <chr> <date>     <dbl>
#>  1 A1    ACT   2010-01-01    11
#>  2 A1    ACT   2011-01-01    31
#>  3 A1    ACT   2012-01-01    44
#>  4 A1    ACT   2013-01-01    21
#>  5 A1    ACT   2014-01-01    54
#>  6 A1    ML1   2014-01-01    54
#>  7 A1    ML1   2015-01-01    53
#>  8 A1    ML1   2016-01-01    13
#>  9 A1    ML2   2014-01-01    54
#> 10 A1    ML2   2015-01-01    33
#> # … with 12 more rows

Created on 2022-03-28 by the reprex package (v2.0.1)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading