Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Fill NA until certain date based on different column per group

I have the following dataframe df (dput below):

   group      date1      date2 value
1      A 2022-01-01 2022-01-07    NA
2      A 2022-01-02 2022-01-07     1
3      A 2022-01-04 2022-01-07    NA
4      A 2022-01-10 2022-01-07    NA
5      B 2022-01-01 2022-01-06     3
6      B 2022-01-03 2022-01-06    NA
7      B 2022-01-04 2022-01-06    NA
8      B 2022-01-06 2022-01-06    NA
9      C 2022-01-01 2022-01-09    NA
10     C 2022-01-03 2022-01-09    NA
11     C 2022-01-04 2022-01-09     2
12     C 2022-01-11 2022-01-09    NA

I would like to fill the NA values per group until date2. So all NA values after the first row with a value (group A with value 1) should be filled until date2 if date1 is less. This is the desired output:

   group      date1      date2 value
1      A 2022-01-01 2022-01-07    NA
2      A 2022-01-02 2022-01-07     1
3      A 2022-01-04 2022-01-07     1
4      A 2022-01-10 2022-01-07    NA
5      B 2022-01-01 2022-01-06     3
6      B 2022-01-03 2022-01-06     3
7      B 2022-01-04 2022-01-06     3
8      B 2022-01-06 2022-01-06     3
9      C 2022-01-01 2022-01-09    NA
10     C 2022-01-03 2022-01-09    NA
11     C 2022-01-04 2022-01-09     2
12     C 2022-01-11 2022-01-09    NA

As you can see in group A only the next row is filled with 1 because the last date1 is after date2. So I was wondering if anyone knows how to fill the NA based on a certain date per group?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel


dput df:

df<-structure(list(group = c("A", "A", "A", "A", "B", "B", "B", "B", 
"C", "C", "C", "C"), date1 = c("2022-01-01", "2022-01-02", "2022-01-04", 
"2022-01-10", "2022-01-01", "2022-01-03", "2022-01-04", "2022-01-06", 
"2022-01-01", "2022-01-03", "2022-01-04", "2022-01-11"), date2 = c("2022-01-07", 
"2022-01-07", "2022-01-07", "2022-01-07", "2022-01-06", "2022-01-06", 
"2022-01-06", "2022-01-06", "2022-01-09", "2022-01-09", "2022-01-09", 
"2022-01-09"), value = c(NA, 1, NA, NA, 3, NA, NA, NA, NA, NA, 
2, NA)), class = "data.frame", row.names = c(NA, -12L))

>Solution :

Create an additional group and use fill

library(dplyr)
library(tidyr)
df %>% 
  group_by(group, after = date1 > date2) %>% 
  fill(value) %>% 
  ungroup() %>% 
  select(-after)

# A tibble: 12 × 4
   group date1      date2      value
   <chr> <chr>      <chr>      <dbl>
 1 A     2022-01-01 2022-01-07    NA
 2 A     2022-01-02 2022-01-07     1
 3 A     2022-01-04 2022-01-07     1
 4 A     2022-01-10 2022-01-07    NA
 5 B     2022-01-01 2022-01-06     3
 6 B     2022-01-03 2022-01-06     3
 7 B     2022-01-04 2022-01-06     3
 8 B     2022-01-06 2022-01-06     3
 9 C     2022-01-01 2022-01-09    NA
10 C     2022-01-03 2022-01-09    NA
11 C     2022-01-04 2022-01-09     2
12 C     2022-01-11 2022-01-09    NA
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading