Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Append one row with average values for selected columns and counting percent for another based on conditions

I have a df below, I need to calculate Pass percent after excluding - rows and the average values of pred1 and pred2:

df <- data.frame(
  name = c('A', 'B', 'C', 'D', 'E'), 
  status = c('Pass', 'Fail', '-', 'Pass', 'Pass'), 
  real = c(10, NA, 8, 9, 4), 
  pred1 = c(50, 20, NA, 14, 11),
  pred2 = c(12, 12, 8, NA, 6)
)

df:

  name status real pred1 pred2
1    A   Pass   10    50    12
2    B   Fail   NA    20    12
3    C      -    8    NA     8
4    D   Pass    9    14    NA
5    E   Pass    4    11     6

The expected result:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  name status real pred1 pred2
1    A   Pass   10    50    12
2    B   Fail   NA    20    12
3    C      -    8    NA     8
4    D   Pass    9    14    NA
5    E   Pass    4    11     6
6 total  0.75   NA 23.75   9.5

I thought to bind to the result below to df, but it’s not concise and beautiful solutions:

pass_percent <- nrow(df %>% filter(status == 'Pass')) / nrow(df %>% filter(status != '-'))
avg_pred1 <- mean(df$pred1, na.rm = T)
avg_pred2 <- mean(df$pred2, na.rm = T)

How could I acheive that in a more concise way with R’s pipe?

>Solution :

What about tibble::add_row:

df %>% 
  add_row(name = "total",
          status = as.character(mean(df$status[df$status != "-"] == "Pass")),
          real = mean(df$real),
          pred1 = mean(df$pred1, na.rm = T),
          pred2 = mean(df$pred2, na.rm = T))

   name status real pred1 pred2
1     A   Pass   10 50.00  12.0
2     B   Fail   NA 20.00  12.0
3     C      -    8    NA   8.0
4     D   Pass    9 14.00    NA
5     E   Pass    4 11.00   6.0
6 total   0.75   NA 23.75   9.5

Explanation of as.character(mean(df$status[df$status != "-"] == "Pass")):

  • df$status[df$status != "-"] is the vector of df$status without the element equal to "-" (so only Pass and Fail).
  • df$status[df$status != "-"] == "Pass" is TRUE if df$status is "Pass", FALSE otherwise.
  • mean(...) is possible because TRUE and FALSE values are coerced to numeric when the mean is computed.
  • as.character(...) is needed because df$status is a character variable.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading