I am making a large data frame using mutate with lots of ifelse conditions. My approach is to not name the columns within mutate because I have many hundreds of these conditions and each time I update one I then have to update them all. Rather I wish to name the columns after the operation outside of mutate.
Here is some code outlining what Im trying to do
df <- data.frame(a = rnorm(20, 100, 1), b = rnorm(20, 100, 1), c = rnorm(20, 100, 1) )
df2 <- df %>%
mutate(# condition 1
ifelse((lag(a, 1) - lag(c, 1)) < (lag(a, 2) - lag(b, 2)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 3) - lag(b, 3)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 4) - lag(b, 4)), 1, 0),
# condition 2
ifelse((lag(a, 1) - lag(c, 1)) < (lag(a, 2) - lag(b, 2)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 3) - lag(b, 3)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 4) - lag(b, 4)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 5) - lag(b, 5)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 6) - lag(b, 6)), 1, 0),
# condition 3
ifelse(a < b, 1, 0),
.keep = 'none'
)
c_names <- paste('df', rep(1:ncol(df2), 1), sep = '')
colnames(df2) <- c_names
the trouble is mutate is truncating the col names of the long ifelse conditions #condition 1 and #condition 2 and lumping them together as ifelse(...) so I end up with only 2 columns instead of 3.
Is there something I can do to prevent this behaviour or a more efficient way of achieving what Im try to do. I want to avoid manually typing out hundreds of column names for each condition every time I need to update the df.I would ideally be able to map the identity of the condition back to the new column name. For e.g.
df3 = ifelse(a < b, 1, 0)
This is possible when mutate doesn’t repair the column name
>Solution :
You could use unique / random column names, UUID for example:
library(dplyr)
set.seed(123)
df <- data.frame(a = rnorm(20, 100, 1), b = rnorm(20, 100, 1), c = rnorm(20, 100, 1))
df2 <- df %>%
mutate(# condition 1
"{uuid::UUIDgenerate()}" :=
ifelse((lag(a, 1) - lag(c, 1)) < (lag(a, 2) - lag(b, 2)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 3) - lag(b, 3)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 4) - lag(b, 4)), 1, 0),
# condition 2
"{uuid::UUIDgenerate()}" :=
ifelse((lag(a, 1) - lag(c, 1)) < (lag(a, 2) - lag(b, 2)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 3) - lag(b, 3)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 4) - lag(b, 4)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 5) - lag(b, 5)) &
(lag(a, 1) - lag(c, 1)) < (lag(a, 6) - lag(b, 6)), 1, 0),
# condition 3
"{uuid::UUIDgenerate()}" :=
ifelse(a < b, 1, 0),
.keep = 'none'
)
str(df2)
#> 'data.frame': 20 obs. of 3 variables:
#> $ 2175b2b7-511f-471a-94d5-d82116b12137: num NA NA NA 0 1 1 0 0 1 1 ...
#> $ 07e353a6-58b9-4c50-9c08-2b7c742cf28b: num NA NA NA 0 NA NA 0 0 1 1 ...
#> $ a4fb004b-f498-4da0-b60b-1fbf872670a5: num 0 1 0 0 0 0 1 1 0 1 ...
c_names <- paste('df', rep(1:ncol(df2), 1), sep = '')
colnames(df2) <- c_names
str(df2)
#> 'data.frame': 20 obs. of 3 variables:
#> $ df1: num NA NA NA 0 1 1 0 0 1 1 ...
#> $ df2: num NA NA NA 0 NA NA 0 0 1 1 ...
#> $ df3: num 0 1 0 0 0 0 1 1 0 1 ...
Created on 2024-01-30 with reprex v2.0.2