I am working with the R programming language.
I have the following matrix:
set.seed(123)
mat <- matrix(ifelse(runif(100) < 0.5, 0, runif(100)), nrow = 10, ncol = 10,
dimnames = list(c('aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff', 'ggg', 'hhh', 'iii', 'jjj'),
c('111', '222', '333', '444', '555', '666', '777', '888', '999', '101010')))
mat <- t(apply(mat, 1, function(x) if(sum(x) > 1) x/sum(x) else x))
111 222 333 444 555 666 777 888 999 101010
aaa 0.0000000 0.28047306 0.19428708 0.1856996 0.0000000 0.00000000 0.15062710 0.18891319 0.00000000 0.0000000
bbb 0.1409196 0.00000000 0.13541406 0.3774219 0.0000000 0.00000000 0.00000000 0.07783414 0.13229252 0.1361178
ccc 0.0000000 0.02648093 0.13420016 0.2935025 0.0000000 0.16917150 0.00000000 0.37664493 0.00000000 0.0000000
ddd 0.2549304 0.25312832 0.05869772 0.1968660 0.0000000 0.00000000 0.00000000 0.00000000 0.07078359 0.1655940
eee 0.3661311 0.00000000 0.28014223 0.0000000 0.0000000 0.08423207 0.26949462 0.00000000 0.00000000 0.0000000
fff 0.0000000 0.12631388 0.87368612 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 0.00000000 0.0000000
ggg 0.2768805 0.00000000 0.04669054 0.2488325 0.0000000 0.00000000 0.22416403 0.00000000 0.08024861 0.1231838
hhh 0.2727062 0.00000000 0.04078665 0.0000000 0.0000000 0.09716544 0.09905154 0.23736023 0.25292995 0.0000000
iii 0.1882696 0.00000000 0.00000000 0.0000000 0.0000000 0.20389182 0.18921225 0.00000000 0.41862635 0.0000000
jjj 0.0000000 0.23662317 0.00000000 0.0000000 0.4282713 0.00000000 0.00000000 0.00000000 0.00000000 0.3351055
I also have this "legend" :
set.seed(123)
col_names <- c('111', '222', '333', '444', '555', '666', '777', '888', '999', '101010')
colors <- sample(c('red', 'green', 'blue'), 10, replace = TRUE)
color_df <- data.frame(col_names, colors)
col_names colors
1 111 blue
2 222 blue
3 333 blue
4 444 green
5 555 blue
6 666 green
7 777 green
8 888 green
9 999 blue
10 101010 red
My Question: I am trying to find the percentage that each row of the matrix belongs to any given color.
The final output should look something like this (first row):
id blue green red
1 aaa 0.4747601 0.5252399 0
I tried to do this with the following code:
# Match colors with matrix columns
col_colors <- color_df$colors[match(colnames(mat), color_df$col_names)]
# Calculate percentage for each color
color_perc <- t(apply(mat, 1, function(x) {
c(
blue = sum(x[col_colors == "blue"]) * 100,
green = sum(x[col_colors == "green"]) * 100,
red = sum(x[col_colors == "red"]) * 100
)
}))
# Combine with row names
final <- data.frame(id = rownames(mat), color_perc)
The result looks something like this:
id blue green red
aaa aaa 47.47601 52.52399 0.00000
bbb bbb 40.86262 45.52560 13.61178
ccc ccc 16.06811 83.93189 0.00000
ddd ddd 63.75400 19.68660 16.55940
eee eee 64.62733 35.37267 0.00000
fff fff 100.00000 0.00000 0.00000
ggg ggg 40.38197 47.29965 12.31838
hhh hhh 56.64228 43.35772 0.00000
iii iii 60.68959 39.31041 0.00000
jjj jjj 66.48945 0.00000 33.51055
Can someone please tell me if I have done this correctly?
Thanks!
>Solution :
We could do this with split
round(100 * sapply(with(color_df, split(col_names, colors)),
\(nm) rowSums(mat[, nm, drop = FALSE])), 3)
-output
blue green red
aaa 47.476 52.524 0.000
bbb 40.863 45.526 13.612
ccc 16.068 83.932 0.000
ddd 63.754 19.687 16.559
eee 64.627 35.373 0.000
fff 100.000 0.000 0.000
ggg 40.382 47.300 12.318
hhh 56.642 43.358 0.000
iii 60.690 39.310 0.000
jjj 66.489 0.000 33.511