How to fill in NA values according to a categorial variable in R

I have some NA values in my data that I would like to fill in according to the filename column. In order words, every observation with the same filename should have the same values in the discipline, nativeness, year, and gender columns.

structure(list(TA = c("future_perfect", "future_progressive", 
"future_simple", "past_perfect", "past_perfect_progressive", 
"past_progressive", "past_simple", "present_perfect", "present_perfect_progressive", 
"present_progressive", "present_simple", "future_perfect", "future_progressive", 
"future_simple", "past_perfect"), filename = c("BIO.G0.01.1", 
"BIO.G0.01.1", "BIO.G0.01.1", "BIO.G0.01.1", "BIO.G0.01.1", "BIO.G0.01.1", 
"BIO.G0.01.1", "BIO.G0.01.1", "BIO.G0.01.1", "BIO.G0.01.1", "BIO.G0.01.1", 
"BIO.G0.02.1", "BIO.G0.02.1", "BIO.G0.02.1", "BIO.G0.02.1"), 
    discipline = c(NA, NA, "BIO", "BIO", NA, "BIO", "BIO", "BIO", 
    "BIO", "BIO", "BIO", NA, NA, "BIO", NA), nativeness = c(NA, 
    NA, "NS", "NS", NA, "NS", "NS", "NS", "NS", "NS", "NS", NA, 
    NA, "NS", NA), year = c(NA, NA, "G0", "G0", NA, "G0", "G0", 
    "G0", "G0", "G0", "G0", NA, NA, "G0", NA), gender = c(NA, 
    NA, "F", "F", NA, "F", "F", "F", "F", "F", "F", NA, NA, "M", 
    NA), n = c(0L, 0L, 9L, 6L, 0L, 2L, 76L, 34L, 1L, 2L, 265L, 
    0L, 0L, 10L, 0L)), row.names = c(NA, -15L), class = c("tbl_df", 
"tbl", "data.frame"))

I know that I can fill() up or down, but some of the missing values are above and some are below, so this would not work. I also know how to fill in the columns based on the value in the filename column.

For instance:

mutate(discipline = case_when(
  stri_detect_fixed(filename, "BIO") ~ "Biology",
  stri_detect_fixed(filename, "PHY") ~ "Physics"))

However, this would not work for the gender and nativeness columns, as this info is not contained in the filename.

>Solution :

We may group by fill and use OP’s code

 df1 %>% 
   group_by(filename) %>% 
   fill(discipline:gender, .direction = "downup") %>%
   ungroup %>% 
   mutate(discipline = case_when(
  stri_detect_fixed(filename, "BIO") ~ "Biology",
  stri_detect_fixed(filename, "PHY") ~ "Physics"))

Leave a Reply