Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to use dplyr to fill cells meeting a criteria with the value of the cell immediately above iteratively?

I am trying to use base R or dplyr to fill dataframe column cells that meet a specified criteria (value of 0 in this case) with the last value above before the 0. So, suppose we have the below data frame myDF, with columns Element, Group, eleCnt, and reSeq generated by the code immediately beneath. I would like to either add a new column "ADD" or modify the existing column reSeq with the manually input values I show under "ADD". Any recommendations for doing this in Base R or dplyr?

   Element Group eleCnt reSeq
   <chr>   <dbl>  <int> <dbl>  ADD  << ADD Explained
 1 R           0      1     1    1     since reSeq not 0 use current row reSeq value
 2 R           0      2     2    2     same as above
 3 X           0      1     1    1     same as above
 4 X           1      2     2    2     same as above
 5 X           1      3     0    2     since reSeq = 0 use reSeq value from last row prior to the 0
 6 X           0      4     4    4     since reSeq not 0 use current row reSeq value
 7 X           0      5     5    5     same as above
 8 X           0      6     6    6     same as above
 9 B           0      1     1    1     same as above
10 R           0      3     3    3     same as above
11 R           2      4     4    4     same as above
12 R           2      5     0    4     since reSeq = 0 use reSeq value from last row prior to the 0
13 X           3      7     7    7     since reSeq not 0 use current row reSeq value
14 X           3      8     0    7     since reSeq = 0 use last reSeq value <> 0 above
15 X           3      9     0    7     since reSeq = 0 use last reSeq value <> 0 above

myDF %>% 
  group_by(Element) %>%
    mutate(eleCnt = row_number()) %>%
  ungroup()%>%
  mutate(reSeq = ifelse(Group == 0 | Group != lag(Group), eleCnt,0) 
  )

Running dput(myDF):

> dput(myDF)
structure(list(Element = c("R", "R", "X", "X", "X", "X", "X", 
"X", "B", "R", "R", "R", "X", "X", "X"), Group = c(0, 0, 0, 1, 
1, 0, 0, 0, 0, 0, 2, 2, 3, 3, 3)), class = "data.frame", row.names = c(NA, 
-15L))

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

We may change the 0s to NA and then do a group by fill

library(dplyr)
library(tidyr)
df1 %>%
   mutate(ADD = na_if(reSeq, 0)) %>%
   group_by(Element) %>%
   fill(ADD) %>%
   ungroup

-output

# A tibble: 15 × 5
   Element Group eleCnt reSeq   ADD
   <chr>   <dbl>  <int> <dbl> <dbl>
 1 R           0      1     1     1
 2 R           0      2     2     2
 3 X           0      1     1     1
 4 X           1      2     2     2
 5 X           1      3     0     2
 6 X           0      4     4     4
 7 X           0      5     5     5
 8 X           0      6     6     6
 9 B           0      1     1     1
10 R           0      3     3     3
11 R           2      4     4     4
12 R           2      5     0     4
13 X           3      7     7     7
14 X           3      8     0     7
15 X           3      9     0     7

The steps in the creation of ‘ADD’ can be simplified from the original data ‘myDF’

library(data.table)
myDF %>% 
  mutate(ADD =  rowid(Element) * NA^!(Group == 0 |!duplicated(Group))) %>% 
 group_by(Element) %>% 
 fill(ADD) %>% 
 ungroup

-output

# A tibble: 15 × 3
   Element Group   ADD
   <chr>   <dbl> <dbl>
 1 R           0     1
 2 R           0     2
 3 X           0     1
 4 X           1     2
 5 X           1     2
 6 X           0     4
 7 X           0     5
 8 X           0     6
 9 B           0     1
10 R           0     3
11 R           2     4
12 R           2     4
13 X           3     7
14 X           3     7
15 X           3     7

data

df1 <- structure(list(Element = c("R", "R", "X", "X", "X", "X", "X", 
"X", "B", "R", "R", "R", "X", "X", "X"), Group = c(0, 0, 0, 1, 
1, 0, 0, 0, 0, 0, 2, 2, 3, 3, 3), eleCnt = c(1L, 2L, 1L, 2L, 
3L, 4L, 5L, 6L, 1L, 3L, 4L, 5L, 7L, 8L, 9L), reSeq = c(1, 2, 
1, 2, 0, 4, 5, 6, 1, 3, 4, 0, 7, 0, 0)), row.names = c(NA, -15L
), class = c("tbl_df", "tbl", "data.frame"))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading