I have this data frame:
dataf <- tibble(A = sample(c(TRUE, FALSE), 10, replace = T),
+ B = sample(c(TRUE, FALSE), 10, replace = T),
+ C = sample(c(TRUE, FALSE), 10, replace = T),
+ group = c(rep("grp1", 3), rep("grp2", 3), rep("grp3", 4)))
> dataf
# A tibble: 10 Ă— 4
A B C group
<lgl> <lgl> <lgl> <chr>
1 TRUE TRUE TRUE grp1
2 FALSE TRUE TRUE grp1
3 TRUE TRUE TRUE grp1
4 TRUE TRUE TRUE grp2
5 FALSE TRUE TRUE grp2
6 TRUE FALSE TRUE grp2
7 TRUE FALSE FALSE grp3
8 TRUE FALSE TRUE grp3
9 FALSE FALSE TRUE grp3
10 FALSE FALSE FALSE grp3
I want to aggregate the rows by the variable group. If in an column there exist a TRUE, a TRUE will be there, otherwise FALSE. E.g. in grp1 column A has TRUE, FALSE and TRUE. Since it has a TRUE, the aggregate should be TRUE for grp1 column A. Similarly, grp3, column B should FALSE as it doesn’t have TRUE in it.
The resulting data frame should look like this:
A B C groupp
<lgl> <lgl> <lgl> <chr>
1 TRUE TRUE TRUE grp1
2 TRUE TRUE TRUE grp2
3 TRUE FALSE TRUE grp3
Any idea how to achieve this?
>Solution :
1) dplyr Using the input in the Note at the end use across with any. At the end move the group column to be the last column.
library(dplyr)
dataf %>%
summarize(across(where(is.logical), any), .by = group) %>%
relocate(group, .after = last_col())
giving
A B C group
1 TRUE TRUE TRUE grp1
2 TRUE TRUE TRUE grp2
3 TRUE FALSE TRUE grp3
2) Base R or with only base R:
aggregate(. ~ group, dataf, any)[c(2:4, 1)]
giving
A B C group
1 TRUE TRUE TRUE grp1
2 TRUE TRUE TRUE grp2
3 TRUE FALSE TRUE grp3
Note
dataf as produced by the code in the question is not reproducible as it uses random numbers without set.seed(...) so we have used the following.
Lines <- " A B C group
1 TRUE TRUE TRUE grp1
2 FALSE TRUE TRUE grp1
3 TRUE TRUE TRUE grp1
4 TRUE TRUE TRUE grp2
5 FALSE TRUE TRUE grp2
6 TRUE FALSE TRUE grp2
7 TRUE FALSE FALSE grp3
8 TRUE FALSE TRUE grp3
9 FALSE FALSE TRUE grp3
10 FALSE FALSE FALSE grp3 "
dataf <- read.table(text = Lines)