I have a dataframe df_team which has multiple Team-Columns, of which of course not all have values. The df looks like this, just with over 100 of these Team_ columns:
dput(df_team)
structure(list(Project = c("etwbv", "werg", "sdfg", "qwreg",
"cae", "refdc"), Team_1 = c("ewrg", "werg", "asd", "qwe", NA,
"vsfd"), Team_URL_1 = c("abc", "bfh", "fse", "rege", NA, "vsefr"
), Team_2 = c("abc1", "bfh", "fse", "rege1", NA, NA), Team_URL_2 = c("abc",
"bfh", "fse", "rege", NA, NA), Team_3 = c("abc1", "bfh", NA,
NA, NA, NA), Team_URL_3 = c("abc", "bfh", NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-6L))
I want to create a new column Team_Count which counts the filled Team_x columns and disregards the NAs, so that I know how many team members one ““Project“` has.
The new dataframe should look like this:
Name Team_1 Team_URL_1 Team_2 Team_URL_2 Team_3 Team_URL_3 Team_Count
etwbv ewrg abc abc1 abc abc1 abc 3
werg werg bfh bfh bfh bfh bfh 3
sdfg asd fse fse fse NA NA 2
qwreg qwe rege rege1 rege NA NA 2
cae NA NA NA NA NA NA 0
refdc vsfd vsefr NA NA NA NA 1
>Solution :
library(dplyr)
df %>%
mutate(Team_Count = rowSums(!is.na(pick(matches("Team_\\d+")))))
If you are using dplyr < 1.1.0 then replace pick with across.
This works by creating a data frame with variables that match the regular expression "Team_\\d+" ("Team_" followed by one or more digits), creating logical values by testing if values are NA, and then summing the number of columns that are not NA by row.
Output
Project Team_1 Team_URL_1 Team_2 Team_URL_2 Team_3 Team_URL_3 Team_Count
1 etwbv ewrg abc abc1 abc abc1 abc 3
2 werg werg bfh bfh bfh bfh bfh 3
3 sdfg asd fse fse fse <NA> <NA> 2
4 qwreg qwe rege rege1 rege <NA> <NA> 2
5 cae <NA> <NA> <NA> <NA> <NA> <NA> 0
6 refdc vsfd vsefr <NA> <NA> <NA> <NA> 1