I have two dataframes mval
and meth_deconv
with shared columns but different row values.
I want to perform min-max normalization of mval
based on meth_deconv
values.
common.cols <- intersect(colnames(mval), colnames(meth_deconv))
meth_deconv <- meth_deconv[,common.cols]
mval <- mval[,common.cols]
bval <- bval[,common.cols]
for (col in colnames(mval)) {
min <- min(meth_deconv[[col]])
max <- max(meth_deconv[[col]])
mval[[col]] <- (mval[[col]] - min) / (max - min)
}
Traceback:
> for (col in colnames(mval)) {
+ min <- min(meth_deconv[[col]])
+ max <- max(meth_deconv[[col]])
+ mval[[col]] <- (mval[[col]] - min) / (max - min)
+ }
Error in mval[[col]] : subscript out of bounds
Input:
> dput(meth_deconv[1:5,1:5])
structure(list(TCGA.Y8.A8RZ.01 = c(0.129859982131871, 0.0357708166456001,
0, 0.133656384812674, 0.0666114231385833), TCGA.Y8.A8RY.01 = c(0.114822027432518,
0.0182327682610597, 0, 0.154950359997823, 0.0170537545658276),
TCGA.Y8.A897.01 = c(0.0733882956002282, 0.0156764793850076,
0, 0.142084581990467, 0.0464498830958926), TCGA.Y8.A896.01 = c(0.105826996952733,
0.0298500219688853, 0, 0.139574516141476, 0.0352706140819193
), TCGA.Y8.A895.01 = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_)), row.names = c("Bcell", "CD8", "Dendritic", "Endo",
"Eos"), class = "data.frame")
> dput(mval[1:5,1:5])
structure(c(2.20666978271644, 2.21762842677891, -4.07494124222421,
-4.13722707002192, -3.43314164549568, 2.33449419612022, 2.34788404801465,
-3.75292484979324, -4.3115910063775, -4.31229291319228, 2.54516913102614,
3.15809412595788, -2.12378973913844, -4.35973967501755, -4.39347889615609,
2.14840959318955, 1.81982095876368, -3.46795103846624, -4.29965006722576,
-4.40595273662642, 2.66361259477239, 2.62697164963472, -1.88151767905837,
-4.13446638546434, -4.09928030669639), dim = c(5L, 5L), dimnames = list(
c("cg00000957", "cg00001349", "cg00001583", "cg00002028",
"cg00002719"), c("TCGA.Y8.A8RZ.01", "TCGA.Y8.A8RY.01", "TCGA.Y8.A897.01",
"TCGA.Y8.A896.01", "TCGA.Y8.A895.01")))
>Solution :
This is because your objects are matrices rather than data frames. When you use [[
notation, the matrix acts like a vector. For example:
mval[[1]]
# [1] 2.20667
This returns the first element, rather than the first column. Note what happens if you try to use [[
with a column name:
mval[["TCGA.Y8.A895.01"]]
# Error in mval[["TCGA.Y8.A895.01"]] : subscript out of bounds
To refer to a column by its name, instead use mval[, col]
:
mval[, "TCGA.Y8.A895.01"]
# cg00000957 cg00001349 cg00001583 cg00002028 cg00002719
# 2.663613 2.626972 -1.881518 -4.134466 -4.099280
Note this returns a vector. To return a one-column matrix, you can do mval[, "TCGA.Y8.A895.01", drop = FALSE]
. See the Simplifying vs preserving subsetting section of Advanced R by Hadley Wickham for more.
If you use mval[, col]
notation your code will work:
for (col in colnames(mval)) {
min <- min(meth_deconv[[col]])
max <- max(meth_deconv[[col]])
mval[, col] <- (mval[, col] - min) / (max - min)
}
However, you do not need a loop here. You can do the same with mapply()
:
mapply(
\(x, y) (y - min(x)) / (max(x) - min(x)),
asplit(meth_deconv, 2), asplit(mval, 2)
)
# TCGA.Y8.A8RZ.01 TCGA.Y8.A8RY.01 TCGA.Y8.A897.01 TCGA.Y8.A896.01 TCGA.Y8.A895.01
# cg00000957 16.51002 15.06608 17.91306 15.39256 NA
# cg00001349 16.59201 15.15249 22.22686 13.03835 NA
# cg00001583 -30.48819 -24.22018 -14.94736 -24.84659 NA
# cg00002028 -30.95420 -27.82563 -30.68412 -30.80541 NA
# cg00002719 -25.68633 -27.83016 -30.92157 -31.56703 NA
Note that we asplit()
each matrix into a list of columns to iterate over it, as otherwise a matrix is treated as a vector and you iterate over elements.