I have a dataframe with ninety columns (in the example only 6) and multiple rows. I would like to sum by rows on all the columns but only when the value is 4 or 5. In the results I would like the number of time the conditions is realised. I don’t know how to add this condition in the following code.
I take any help, thank you.
df = data.frame(A = c(1,2,2,5,5,2,2,5,5,4,4,4,2,2,5),
B = c(2,2,2,5,5,4,2,3,1,4,1,4,5,2,5),
C = c(5,3,3,5,5,2,1,5,5,1,1,4,1,2,5),
D = c(4,5,5,3,5,5,1,5,5,4,2,3,2,2,1),
E = c(1,2,2,2,5,2,5,3,3,4,4,4,2,2,1),
H = c(1,2,2,3,5,2,2,5,3,1,4,4,4,2,3))
> df
A B C D E H
1 1 2 5 4 1 1
2 2 2 3 5 2 2
3 2 2 3 5 2 2
4 5 5 5 3 2 3
5 5 5 5 5 5 5
6 2 4 2 5 2 2
7 2 2 1 1 5 2
8 5 3 5 5 3 5
9 5 1 5 5 3 3
10 4 4 1 4 4 1
11 4 1 1 2 4 4
12 4 4 4 3 4 4
13 2 5 1 2 2 4
14 2 2 2 2 2 2
15 5 5 5 1 1 3
library(dplyr)
df %>%
mutate(sum = rowSums(across(where(is.numeric))))
>Solution :
A base solution:
df$sum_4_5 <- apply(df, 1, \(x) sum(x[x %in% 4:5]))
Its dplyr equivalent:
library(dplyr)
df %>%
rowwise() %>%
mutate(sum_4_5 = c_across(everything()) %>% { sum(.[. %in% 4:5]) }) %>%
ungroup()
# # A tibble: 15 × 7
# A B C D E H sum_4_5
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 2 5 4 1 1 9
# 2 2 2 3 5 2 2 5
# 3 2 2 3 5 2 2 5
# 4 5 5 5 3 2 3 15
# 5 5 5 5 5 5 5 30
# ...
Update
If you want to get the number of 4 and 5 in each row, the code will be easier:
df %>%
mutate(sum_4_5 = rowSums(across(where(is.numeric), ~ .x %in% 4:5)))
# # A tibble: 15 × 7
# A B C D E H sum_4_5
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 2 5 4 1 1 2
# 2 2 2 3 5 2 2 1
# 3 2 2 3 5 2 2 1
# 4 5 5 5 3 2 3 3
# 5 5 5 5 5 5 5 6
# ...