Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Handling missing values R

I’ve used group_by function in R, as :

data = r %>%
  group_by(Name, yp) %>%
  summarise(nb = n()) %>%
  mutate(Frac = nb / sum(nb))

This is what I get

Name   yp    nb    Frac

0_S     0    1   0.03030303
0_S     1    20  0.60606061
0_S     2    12  0.36363636
1_S     1    16  0.59259259
1_S     2    11  0.40740741    

But for each item in Name (each time 3 : 0,1,2), when there is no item in the previous table, I get a missing value instead of a 0.
So, here is what I would like (adding 1_S 0 row) for example if 0 is missing in yp.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

 Name   yp   nb    Frac

0_S     0    1   0.03030303
0_S     1    20  0.60606061
0_S     2    12  0.36363636

1_S     0    0   0

1_S     1    16  0.59259259
1_S     2    11  0.40740741

Reproducible example :

Df <- data.frame(A = c('0_S','0_S','0_S','0_S','0_S','0_S','1_S','1_S','1_S','1_S','1_S','1_S'),
                 B = c(0,0,1,1,2,2,1,1,1,1,2,2),
                 C = c(0,0,1,1,2,2,0,0,1,1,2,2))
Df

DDf = Df %>%
  group_by(A,B) %>%
  summarise(n = n()) %>%
  mutate(Frac = n / sum(n))

head(DDf)

>Solution :

You can use tidyr::complete:

library(tidyverse)

DDf %>%
  ungroup() %>% 
  complete(A, B, fill = list(n = 0, Frac = 0)

# A tibble: 6 x 4
  A         B     n  Frac
  <chr> <dbl> <dbl> <dbl>
1 0_S       0     2 0.333
2 0_S       1     2 0.333
3 0_S       2     2 0.333
4 1_S       0     0 0    
5 1_S       1     4 0.667
6 1_S       2     2 0.333

data

Df <- data.frame(A = c('0_S','0_S','0_S','0_S','0_S','0_S','1_S','1_S','1_S','1_S','1_S','1_S'),
                 B = c(0,0,1,1,2,2,1,1,1,1,2,2),
                 C = c(0,0,1,1,2,2,0,0,1,1,2,2))
DDf = Df %>%
  group_by(A,B) %>%
  summarise(n = n()) %>%
  mutate(Frac = n / sum(n))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading