Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to add a column next to another consecutively without using loops in R?

I have this dataframe:

set.seed(42)  ## for sake of reproducibility
df <- data.frame("time"=c(1:40), "Var1"=sample(1:100, size=40), 
                 "Var2"=sample(1:100, size=40), "Var3"=sample(1:100, size=40))
head(df)
#   time Var1 Var2 Var3
# 1    1   49    3   38
# 2    2   65   21    1
# 3    3   25    2   13
# 4    4   74   58   78
# 5    5   18   10    5
# 6    6  100   40   73

As I want to get the quantiles per column, I have this code:

(quantiles <- as.data.frame(apply(df[2:4] , 2 , quantile, probs=seq(0, 1, 1/10), na.rm=TRUE)))
#       Var1 Var2  Var3
# 0%     2.0  2.0   1.0
# 10%    5.9  8.9  11.4
# 20%   19.6 17.6  15.8
# 30%   25.7 31.1  28.1
# 40%   35.2 41.2  35.8
# 50%   42.5 51.0  42.5
# 60%   53.2 57.4  56.4
# 70%   67.3 70.2  66.0
# 80%   80.8 80.4  78.6
# 90%   89.4 90.5  90.1
# 100% 100.0 99.0 100.0

My objective is to add into my original dataframe (df) a column with each of the quantiles per variable. In order to achieve that, I have this code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df$QuantVar1 <- .bincode(x=df$Var1, breaks=quantiles$Var1, include.lowest=T, right=T)
df$QuantVar2 <- .bincode(x=df$Var2, breaks=quantiles$Var2, include.lowest=T, right=T)
df$QuantVar3 <- .bincode(x=df$Var3, breaks=quantiles$Var3, include.lowest=T, right=T)

head(df)
#   time Var1 Var2 Var3 QuantVar1 QuantVar2 QuantVar3
# 1    1   49    3   38         6         1         5
# 2    2   65   21    1         7         3         1
# 3    3   25    2   13         3         1         2
# 4    4   74   58   78         8         7         8
# 5    5   18   10    5         2         2         1
# 6    6  100   40   73        10         4         8

(Note that I use .bincode because I didn’t have unique breaks and I found this solution).

As I want each new column next to the original variable, I relocate each of them manually:

library(dplyr); library(tidyft)

df <- df %>%
  relocate(QuantVar1, .after  = Var1)
df <- df %>%
  relocate(QuantVar2, .after  = Var2)
df <- df %>%
  relocate(QuantVar3, .after  = Var3)

head(df)
#   time Var1 QuantVar1 Var2 QuantVar2 Var3 QuantVar3
# 1    1   49         6    3         1   38         5
# 2    2   65         7   21         3    1         1
# 3    3   25         3    2         1   13         2
# 4    4   74         8   58         7   78         8
# 5    5   18         2   10         2    5         1
# 6    6  100        10   40         4   73         8

The code works perfectly. But… what if I have 100 variables or more? I cannot do the process 100 times or even more.

I want to avoid using loops and I have been trying to use the lapply family.
I have already seen how to add new columns with lapply in this post but I don’t know if there is a way to add the new column next to the column that it is using as I have in the above example.

Does anybody have an idea about how to do it?

>Solution :

You are on the right track. You can do this with lapply:

cols_to_include  <- grep("^Var", names(df), value = TRUE) # "Var1" "Var2" "Var3"

new_names  <- paste0("Quant", cols_to_include) # "QuantVar1" "QuantVar2" "QuantVar3"

df[new_names]  <- lapply(
    cols_to_include, 
    \(col) {
        .bincode(
            x = df[[col]],
            breaks = quantiles[[col]],
            include.lowest = TRUE,
            right = TRUE
        )
    }
) 

head(df)
#   time Var1 Var2 Var3 QuantVar1 QuantVar2 QuantVar3
# 1    1   53   83   49         5         9         5
# 2    2   56   64   61         6         7         6
# 3    3   13   77   20         2         9         2
# 4    4  100   73    6        10         8         1
# 5    5   87   75   65         9         8         6
# 6    6   52    9   92         5         2         9

Note: output not identical to yours as we did not use a fixed seed but it should work.

Sorting the columns

In this case you can just order the columns by sorting on the digits in each column name:

new_order  <- order(gsub("\\D+", "", names(df)))

# Change order
df  <- df[new_order]

head(df)
#   time Var1 QuantVar1 Var2 QuantVar2 Var3 QuantVar3
# 1    1   53         5   83         9   49         5
# 2    2   56         6   64         7   61         6
# 3    3   13         2   77         9   20         2
# 4    4  100        10   73         8    6         1
# 5    5   87         9   75         8   65         6
# 6    6   52         5    9         2   92         9
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading