I’m trying to write a loop that takes variables from some columns and calculates using a formula, then populate results in different columns with similar column names with a suffix.
I have a dataframe with 100000 rows 53 columns; 3-29 cols will be used for calculation ..
So far what I did …
GC05cr_h16_dat2$ln1 <- 0
for(i in 3:29) {
log_read <- log(GC05cr_h16_dat2[ , i] +1) /max(GC05cr_h16_dat2[ , i])
GC05cr_h16_dat2$ln1[i] <- log_read
}
The table:
head(GC05cr_h16_dat2)
# A tibble: 6 Ă— 54
chr start EE87893 EE87894 EE87895 EE87896 EE87897 EE87898 EE87899 EE87900 EE87901 EE87902 EE87903
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 chr3 1.45e8 4 2 4 2 4 2 5 5 4 1 4
2 chr4 1.63e8 2 4 3 1 1 4 5 5 5 4 5
3 chr4 3.57e7 3 5 3 1 6 6 5 10 4 6 3
4 chr18 6.58e7 2 1 6 6 2 1 3 5 3 4 1
5 chr10 8.43e7 5 1 4 3 1 5 0 11 2 4 8
6 chr3 1.84e8 5 1 3 5 5 4 3 9 4 3 6
#
The results of all columns under consideration of for loop are printed in a single column under column name ln1.
My expected results of any single column would be printed in a separate column and the name of the column would be suffixed by ln1.
EE87893ln1 EE87894ln1 EE87895ln1 .....
>Solution :
Using your existing code, you can tweak it to change your for loop index from an integer to the column names of interest, then directly create a new column using paste0:
for (i in names(GC05cr_h16_dat2)[grep("EE", names(GC05cr_h16_dat2))]){
GC05cr_h16_dat2[, paste0(i, "ln1")] <- log(GC05cr_h16_dat2[ , i] + 1) / max(GC05cr_h16_dat2[ , i])
}
Output (you may need to scroll over because the output is wide):
chr start EE87893 EE87894 EE87895 EE87896 EE87897 EE87898 EE87899 EE87900 EE87901 EE87902 EE87903 EE87893ln1 EE87894ln1 EE87895ln1 EE87896ln1 EE87897ln1 EE87898ln1 EE87899ln1 EE87900ln1 EE87901ln1 EE87902ln1 EE87903ln1
1 chr3 1.45e+08 4 2 4 2 4 2 5 5 4 1 4 0.3218876 0.2197225 0.2682397 0.1831020 0.2682397 0.1831020 0.3583519 0.1628872 0.3218876 0.1155245 0.2011797
2 chr4 1.63e+08 2 4 3 1 1 4 5 5 5 4 5 0.2197225 0.3218876 0.2310491 0.1155245 0.1155245 0.2682397 0.3583519 0.1628872 0.3583519 0.2682397 0.2239699
3 chr4 3.57e+07 3 5 3 1 6 6 5 10 4 6 3 0.2772589 0.3583519 0.2310491 0.1155245 0.3243184 0.3243184 0.3583519 0.2179905 0.3218876 0.3243184 0.1732868
4 chr18 6.58e+07 2 1 6 6 2 1 3 5 3 4 1 0.2197225 0.1386294 0.3243184 0.3243184 0.1831020 0.1155245 0.2772589 0.1628872 0.2772589 0.2682397 0.0866434
5 chr10 8.43e+07 5 1 4 3 1 5 0 11 2 4 8 0.3583519 0.1386294 0.2682397 0.2310491 0.1155245 0.2986266 0.0000000 0.2259006 0.2197225 0.2682397 0.2746531
6 chr3 1.84e+08 5 1 3 5 5 4 3 9 4 3 6 0.3583519 0.1386294 0.2310491 0.2986266 0.2986266 0.2682397 0.2772589 0.2093259 0.3218876 0.2310491 0.2432388
If you wanted to specify the columns by numbers directly (ie, columns 3 though 29), just use:
for (i in names(GC05cr_h16_dat2)[3:29]){...}
You could also use lapply for this:
GC05cr_h16_dat2[paste0(names(GC05cr_h16_dat2)[3:13], "ln1")] <-
lapply(GC05cr_h16_dat2[3:13], function(x) log(x + 1) / max(x))