Let suppose the following dataset :
| ID | Number |
|---|---|
| AX | 1 |
| AX | NA |
| AX | NA |
| AX | 2 |
| AX | NA |
| AX | NA |
| AX | NA |
I would like to only fill NA which are between real observations observed, such as :
| ID | Number |
|---|---|
| AX | 1 |
| AX | 1 |
| AX | 1 |
| AX | 2 |
| AX | NA |
| AX | NA |
| AX | NA |
I tried to do it with tidyr::fill (DF %>% fill(Number)) but i cannot reproduce those results because it fills all the NA values. How can i do it ?
>Solution :
Here’s one method that fills and restores the NA for those cases where follow-on non-NA values were not found.
library(dplyr)
library(tidyr) # fill
quux %>%
group_by(ID) %>%
mutate(hasmore = rev(cumany(!is.na(rev(Number))))) %>%
fill(Number) %>%
mutate(Number = if_else(hasmore, Number, Number[NA])) %>%
ungroup()
# # A tibble: 7 × 3
# ID Number hasmore
# <chr> <chr> <lgl>
# 1 AX 1 TRUE
# 2 AX 1 TRUE
# 3 AX 1 TRUE
# 4 AX 2 TRUE
# 5 AX <NA> FALSE
# 6 AX <NA> FALSE
# 7 AX <NA> FALSE
I use Number[NA] instead of just NA because I’m not certain if your Number is character, numeric, or integer; Number[NA] is never wrong, it will always do as intended, but using the wrong type of NA (there are at least ten kinds of NA) can produce an error.