Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Count number of filled out columns in R dataframe to create new column

I have a dataframe df_team which has multiple Team-Columns, of which of course not all have values. The df looks like this, just with over 100 of these Team_ columns:

dput(df_team)

structure(list(Project = c("etwbv", "werg", "sdfg", "qwreg", 
"cae", "refdc"), Team_1 = c("ewrg", "werg", "asd", "qwe", NA, 
"vsfd"), Team_URL_1 = c("abc", "bfh", "fse", "rege", NA, "vsefr"
), Team_2 = c("abc1", "bfh", "fse", "rege1", NA, NA), Team_URL_2 = c("abc", 
"bfh", "fse", "rege", NA, NA), Team_3 = c("abc1", "bfh", NA, 
NA, NA, NA), Team_URL_3 = c("abc", "bfh", NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-6L))

I want to create a new column Team_Count which counts the filled Team_x columns and disregards the NAs, so that I know how many team members one ““Project“` has.

The new dataframe should look like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Name   Team_1  Team_URL_1 Team_2  Team_URL_2  Team_3  Team_URL_3 Team_Count
etwbv  ewrg    abc        abc1    abc         abc1    abc        3
werg   werg    bfh        bfh     bfh         bfh     bfh        3
sdfg   asd     fse        fse     fse         NA      NA         2
qwreg  qwe     rege       rege1   rege        NA      NA         2
cae    NA      NA         NA      NA          NA      NA         0
refdc  vsfd    vsefr      NA      NA          NA      NA         1

>Solution :

library(dplyr)

df %>% 
  mutate(Team_Count = rowSums(!is.na(pick(matches("Team_\\d+")))))

If you are using dplyr < 1.1.0 then replace pick with across.

This works by creating a data frame with variables that match the regular expression "Team_\\d+" ("Team_" followed by one or more digits), creating logical values by testing if values are NA, and then summing the number of columns that are not NA by row.

Output

  Project Team_1 Team_URL_1 Team_2 Team_URL_2 Team_3 Team_URL_3 Team_Count
1   etwbv   ewrg        abc   abc1        abc   abc1        abc          3
2    werg   werg        bfh    bfh        bfh    bfh        bfh          3
3    sdfg    asd        fse    fse        fse   <NA>       <NA>          2
4   qwreg    qwe       rege  rege1       rege   <NA>       <NA>          2
5     cae   <NA>       <NA>   <NA>       <NA>   <NA>       <NA>          0
6   refdc   vsfd      vsefr   <NA>       <NA>   <NA>       <NA>          1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading