For the following dataframe:
df <- structure(list(test = c("A", "B", "C"), `2019` = c("true", "",
"false"), `2020` = c("false", "true", "true"), `2021` = c("true",
"false", "true"), `2022` = c("", "false", "false")), class = "data.frame", row.names = c(NA,
-3L))
Out:
test 2019 2020 2021 2022
1 A true false true
2 B true false false
3 C false true true false
I need to count number of trues from column 2019 to 2022, then calculate percentage of number of true from not NA or not empty values:
The expected result could be like this:
test 2019 2020 2021 2022 true_pct
1 A true false true 0.67 # 2/3
2 B true false false 0.33 # 1/3
3 C false true true false 0.50 # 2/4
The code below returns wrong answer since it take all year columns since there are empty string in the data, but I need to filter mask such as !is.na(df[,2:5]) | df[,2:5] != '':
df$count <- rowSums(df[-1] == "true")
df$not_na <- rowSums(!is.na(df[,2:5]))
# df$not_na <- rowSums(!complete.cases(df[,2:5]))
df$true_pct <- df$count/df$not_na
Out:
test 2019 2020 2021 2022 count not_na true_pct
1 A true false true 2 4 0.50
2 B true false false 1 4 0.25
3 C false true true false 2 4 0.50
Please share better solutions, thanks.
>Solution :
You can use –
#To select only required columns
tmp <- df[-1]
#Total true values divided by total values which are not `NA` or empty.
df$true_pct <- rowSums(tmp == "true")/rowSums(tmp != "" & !is.na(tmp))
df
# test 2019 2020 2021 2022 true_pct
#1 A true false true 0.6666667
#2 B true false false 0.3333333
#3 C false true true false 0.5000000