Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to count string numbers row-wisely and count percentage (only not empty columns taking into account)

For the following dataframe:

df <- structure(list(test = c("A", "B", "C"), `2019` = c("true", "", 
"false"), `2020` = c("false", "true", "true"), `2021` = c("true", 
"false", "true"), `2022` = c("", "false", "false")), class = "data.frame", row.names = c(NA, 
-3L))

Out:

  test  2019  2020  2021  2022
1    A  true false  true      
2    B        true false false
3    C false  true  true false

I need to count number of trues from column 2019 to 2022, then calculate percentage of number of true from not NA or not empty values:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The expected result could be like this:

  test  2019  2020  2021  2022 true_pct
1    A  true false  true           0.67 # 2/3
2    B        true false false     0.33 # 1/3
3    C false  true  true false     0.50 # 2/4

The code below returns wrong answer since it take all year columns since there are empty string in the data, but I need to filter mask such as !is.na(df[,2:5]) | df[,2:5] != '':

df$count <- rowSums(df[-1] == "true")
df$not_na <- rowSums(!is.na(df[,2:5]))
# df$not_na <- rowSums(!complete.cases(df[,2:5]))
df$true_pct <- df$count/df$not_na

Out:

  test  2019  2020  2021  2022 count not_na true_pct
1    A  true false  true           2      4     0.50
2    B        true false false     1      4     0.25
3    C false  true  true false     2      4     0.50

Please share better solutions, thanks.

>Solution :

You can use –

#To select only required columns
tmp <- df[-1]
#Total true values divided by total values which are not `NA` or empty.
df$true_pct <- rowSums(tmp == "true")/rowSums(tmp != "" & !is.na(tmp))
df

#  test  2019  2020  2021  2022  true_pct
#1    A  true false  true       0.6666667
#2    B        true false false 0.3333333
#3    C false  true  true false 0.5000000
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading