Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replace missing values if previous and next values are consistent

I am currently working with time series data which looks something like the following:

ID Var1 Var2 Var3 Var4 Var5
1 A NA A NA A
2 B C NA NA B
3 A A NA NA A
4 A B NA NA B
5 C NA B NA B
df <- data.frame("ID" = c(1, 2, 3, 4, 5),
             "Var1" = c("A", "B", "A", "A", "C"),
             "Var2" = c(NA, "C", "A", "B", NA),
             "Var3" = c("A", NA, NA, NA, "B"),
             "Var4" = c(NA, NA, NA, NA, NA),
             "Var5" = c("A", "B", "A", "B", "B"))

I wish to fill in the "NA" values if the first non-missing previous and first non-missing next value are consistent. That is, the desired result would be

ID Var1 Var2 Var3 Var4 Var5
1 A A A A A
2 B C NA NA B
3 A A A A A
4 A B B B B
5 C NA B B B

Where the data for ID = 2 is not replaced, since Var2 and Var5 do not match. Moreover, the missing value for ID = 2 at Var2 is not replaced, since Var1 and Var3 are not consistent. I am struggling with how to accomplish this, and any help would be appreciated.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

  1. Pivot longer to make use of tidyr::fill().
  2. Use fill() to create fill_down and fill_up columns, which will include the previous and next non-missing values, respectively.
  3. If previous non-missing == next non-missing, use that value; otherwise keep value as is. (This will also keep non-missing values as is, because in this case previous non-missing will always == next non-missing.)
  4. Pivot back to original format.
library(tidyverse)

df_filled <- df %>% 
  pivot_longer(!ID) %>% 
  mutate(
    fill_down = value,
    fill_up = value
  ) %>% 
  group_by(ID) %>% 
  fill(fill_down) %>% 
  fill(fill_up, .direction = "up") %>% 
  mutate(value = if_else(fill_down == fill_up, fill_down, value)) %>% 
  ungroup() %>% 
  pivot_wider(id_cols = ID)
  
df_filled
# # A tibble: 5 x 6
#      ID Var1  Var2  Var3  Var4  Var5 
#   <dbl> <chr> <chr> <chr> <chr> <chr>
# 1     1 A     A     A     A     A    
# 2     2 B     C     NA    NA    B    
# 3     3 A     A     A     A     A    
# 4     4 A     B     B     B     B    
# 5     5 C     NA    B     B     B  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading