Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

filling missing values using tidyverse syntax based on conditions within the tibble

I’m having difficulty filling missing values in my tibble based on some conditions.

I’m trying to fill NA values in the manufacturer column based on the condition field either being YES or NO, I’m only interested in filling NA values for groups where the condition column contains YES.

I’m getting the following error which I can’t figure out:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

no applicable method for ‘fill’ applied to an object of class
"character"

Here’s a simple reprex which hopefully provides some context to the problem I’m trying to solve.

df <- tibble(
  code = c("A", "A", "A", "A", "B", "B", "B", "B", "B"),
  cost = c(5000, 4000, 3000, 2000, 40000, 30000, 20000, 10000, 5000),
  manufacturer = c("ManA", NA, NA, NA, "ManB", "ManB", NA, NA, "ManB"),
  condition = c("NO", "NO", "NO", "NO", "YES", "YES", "YES", "YES", "YES")
) %>%
  group_by(code) %>%
  arrange(desc(cost), .by_group = TRUE) %>%
  mutate(manufacturer = if_else(condition == "YES", fill(manufacturer, .direction = "down"), manufacturer))

I’ve included the mutate in the last line to highlight what I’ve tried so far to solve for this but is causing the error above.

Any help that can be provided would be greatly appreciated.

>Solution :

You can create a copy of manufacturer column, fill everything and then based on condition assign the original values back.

library(dplyr)
library(tidyr)

df %>%
  #Create a copy of the column
  mutate(new_manufacturer = manufacturer) %>%
  group_by(code) %>%
  arrange(desc(cost), .by_group = TRUE) %>%
  # Fill everything
  fill(new_manufacturer, .direction = "down") %>%
  # Keep the new value only if condition is "YES"
  mutate(new_manufacturer = if_else(condition == "YES", 
                 new_manufacturer, manufacturer)) %>%
  ungroup()

#  code   cost manufacturer condition new_manufacturer
#  <chr> <dbl> <chr>        <chr>     <chr>           
#1 A      5000 ManA         NO        ManA            
#2 A      4000 NA           NO        NA              
#3 A      3000 NA           NO        NA              
#4 A      2000 NA           NO        NA              
#5 B     40000 ManB         YES       ManB            
#6 B     30000 ManB         YES       ManB            
#7 B     20000 NA           YES       ManB            
#8 B     10000 NA           YES       ManB            
#9 B      5000 ManB         YES       ManB            

new_manufacturer column is the output that you are looking for.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading