Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to fill in NA values according to a categorial variable in R

I have some NA values in my data that I would like to fill in according to the filename column. In order words, every observation with the same filename should have the same values in the discipline, nativeness, year, and gender columns.

structure(list(TA = c("future_perfect", "future_progressive", 
"future_simple", "past_perfect", "past_perfect_progressive", 
"past_progressive", "past_simple", "present_perfect", "present_perfect_progressive", 
"present_progressive", "present_simple", "future_perfect", "future_progressive", 
"future_simple", "past_perfect"), filename = c("BIO.G0.01.1", 
"BIO.G0.01.1", "BIO.G0.01.1", "BIO.G0.01.1", "BIO.G0.01.1", "BIO.G0.01.1", 
"BIO.G0.01.1", "BIO.G0.01.1", "BIO.G0.01.1", "BIO.G0.01.1", "BIO.G0.01.1", 
"BIO.G0.02.1", "BIO.G0.02.1", "BIO.G0.02.1", "BIO.G0.02.1"), 
    discipline = c(NA, NA, "BIO", "BIO", NA, "BIO", "BIO", "BIO", 
    "BIO", "BIO", "BIO", NA, NA, "BIO", NA), nativeness = c(NA, 
    NA, "NS", "NS", NA, "NS", "NS", "NS", "NS", "NS", "NS", NA, 
    NA, "NS", NA), year = c(NA, NA, "G0", "G0", NA, "G0", "G0", 
    "G0", "G0", "G0", "G0", NA, NA, "G0", NA), gender = c(NA, 
    NA, "F", "F", NA, "F", "F", "F", "F", "F", "F", NA, NA, "M", 
    NA), n = c(0L, 0L, 9L, 6L, 0L, 2L, 76L, 34L, 1L, 2L, 265L, 
    0L, 0L, 10L, 0L)), row.names = c(NA, -15L), class = c("tbl_df", 
"tbl", "data.frame"))

I know that I can fill() up or down, but some of the missing values are above and some are below, so this would not work. I also know how to fill in the columns based on the value in the filename column.

For instance:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

mutate(discipline = case_when(
  stri_detect_fixed(filename, "BIO") ~ "Biology",
  stri_detect_fixed(filename, "PHY") ~ "Physics"))

However, this would not work for the gender and nativeness columns, as this info is not contained in the filename.

>Solution :

We may group by fill and use OP’s code

library(dplyr)
library(stringi)
library(tidyr)
 df1 %>% 
   group_by(filename) %>% 
   fill(discipline:gender, .direction = "downup") %>%
   ungroup %>% 
   mutate(discipline = case_when(
  stri_detect_fixed(filename, "BIO") ~ "Biology",
  stri_detect_fixed(filename, "PHY") ~ "Physics"))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading