Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I add a new column through a conditional mutate but preserve the original dataframe?

I have a large dataframe (df) containing 500+ rows, 50+ columns/variables but only want to target specific variables.

targ_vars <- c("all3a1", "3a1_arc",
              "all3b1", "3b1_arc",
              "all3c1", "3c1_arc")

The vector above contains the variables which have frequency data (i.e. multiple rows with 1,2,3 etc.)

I want to add a new count column in the original large dataframe (df) which contains the row sum of any non-NA value specifically for those select variables in "targ_vars".

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Again, I’m not trying to add the value of the actual frequency data across each of those variables, but moreso just a sum of any non-NA value per row (i.e. 1,2,NA,7,NA,1 = total row count of 4 non-NA).

I’ve gotten as far as this:

df <- df %>%
        select(targ_vars) %>%
        mutate(targ_var_count = rowSums(!is.na(.), na.rm = TRUE))

The problem is I’m not sure how to "deselect" the variables I used to run the mutate calculation. The line above would result in overwriting the entire original dataframe (df) containing 50+ columns/vars, and placing back only the selected 6 variables in (targ_vars) plus the new (targ_var_count) variable that mutate calculated.

Essentially, I just want to focus on that last mutate line, and plop that new count column back into the original (df).

I tried something like the one below but it ended up giving me a list when I call "df$targcount" instead of just 1 rowSum column:

df$targcount <- df %>%
    select(targ_vars) %>%
    mutate(targcount = rowSums(!is.na(.), na.rm = TRUE))

Any help/tips would be appreciated.

>Solution :

You could use dplyr::across to get the count of non NA values for just your targ_vars columns.

Using some fake random example data:

set.seed(123)

dat <- data.frame(
  a = sample(c(0, NA), 10, replace = TRUE),
  b = sample(c(0, NA), 10, replace = TRUE),
  c = sample(c(0, NA), 10, replace = TRUE),
  d = sample(c(0, NA), 10, replace = TRUE)
)

targ_vars <- c("c", "d")

library(dplyr, w = FALSE)

dat %>%
  mutate(targcount = rowSums(across(all_of(targ_vars), ~ !is.na(.x))))
#>     a  b  c  d targcount
#> 1   0 NA  0  0         2
#> 2   0 NA NA NA         0
#> 3   0 NA  0  0         2
#> 4  NA  0  0 NA         1
#> 5   0 NA  0 NA         1
#> 6  NA  0  0  0         2
#> 7  NA NA NA  0         1
#> 8  NA  0 NA  0         1
#> 9   0  0  0  0         2
#> 10  0  0 NA NA         0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading