Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R: Performing Matches Between Data Frames and Matricies

I am working with the R programming language.

I have the following matrix:

set.seed(123)
mat <- matrix(ifelse(runif(100) < 0.5, 0, runif(100)), nrow = 10, ncol = 10,
              dimnames = list(c('aaa', 'bbb', 'ccc', 'ddd', 'eee', 'fff', 'ggg', 'hhh', 'iii', 'jjj'),
                              c('111', '222', '333', '444', '555', '666', '777', '888', '999', '101010')))

mat <- t(apply(mat, 1, function(x) if(sum(x) > 1) x/sum(x) else x))

          111        222        333       444       555        666        777        888        999    101010
aaa 0.0000000 0.28047306 0.19428708 0.1856996 0.0000000 0.00000000 0.15062710 0.18891319 0.00000000 0.0000000
bbb 0.1409196 0.00000000 0.13541406 0.3774219 0.0000000 0.00000000 0.00000000 0.07783414 0.13229252 0.1361178
ccc 0.0000000 0.02648093 0.13420016 0.2935025 0.0000000 0.16917150 0.00000000 0.37664493 0.00000000 0.0000000
ddd 0.2549304 0.25312832 0.05869772 0.1968660 0.0000000 0.00000000 0.00000000 0.00000000 0.07078359 0.1655940
eee 0.3661311 0.00000000 0.28014223 0.0000000 0.0000000 0.08423207 0.26949462 0.00000000 0.00000000 0.0000000
fff 0.0000000 0.12631388 0.87368612 0.0000000 0.0000000 0.00000000 0.00000000 0.00000000 0.00000000 0.0000000
ggg 0.2768805 0.00000000 0.04669054 0.2488325 0.0000000 0.00000000 0.22416403 0.00000000 0.08024861 0.1231838
hhh 0.2727062 0.00000000 0.04078665 0.0000000 0.0000000 0.09716544 0.09905154 0.23736023 0.25292995 0.0000000
iii 0.1882696 0.00000000 0.00000000 0.0000000 0.0000000 0.20389182 0.18921225 0.00000000 0.41862635 0.0000000
jjj 0.0000000 0.23662317 0.00000000 0.0000000 0.4282713 0.00000000 0.00000000 0.00000000 0.00000000 0.3351055

I also have this "legend" :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

set.seed(123)
col_names <- c('111', '222', '333', '444', '555', '666', '777', '888', '999', '101010')
colors <- sample(c('red', 'green', 'blue'), 10, replace = TRUE)
color_df <- data.frame(col_names, colors)

   col_names colors
1        111   blue
2        222   blue
3        333   blue
4        444  green
5        555   blue
6        666  green
7        777  green
8        888  green
9        999   blue
10    101010    red

My Question: I am trying to find the percentage that each row of the matrix belongs to any given color.

The final output should look something like this (first row):

       id      blue     green red
1 aaa 0.4747601 0.5252399   0

I tried to do this with the following code:

# Match colors with matrix columns
col_colors <- color_df$colors[match(colnames(mat), color_df$col_names)]

# Calculate percentage for each color
color_perc <- t(apply(mat, 1, function(x) {
  c(
    blue = sum(x[col_colors == "blue"]) * 100,
    green = sum(x[col_colors == "green"]) * 100,
    red = sum(x[col_colors == "red"]) * 100
  )
}))

# Combine with row names
final <- data.frame(id = rownames(mat), color_perc)

The result looks something like this:

     id      blue    green      red
aaa aaa  47.47601 52.52399  0.00000
bbb bbb  40.86262 45.52560 13.61178
ccc ccc  16.06811 83.93189  0.00000
ddd ddd  63.75400 19.68660 16.55940
eee eee  64.62733 35.37267  0.00000
fff fff 100.00000  0.00000  0.00000
ggg ggg  40.38197 47.29965 12.31838
hhh hhh  56.64228 43.35772  0.00000
iii iii  60.68959 39.31041  0.00000
jjj jjj  66.48945  0.00000 33.51055

Can someone please tell me if I have done this correctly?

Thanks!

>Solution :

We could do this with split

round(100 * sapply(with(color_df, split(col_names, colors)),
     \(nm) rowSums(mat[, nm, drop = FALSE])), 3)

-output

       blue  green    red
aaa  47.476 52.524  0.000
bbb  40.863 45.526 13.612
ccc  16.068 83.932  0.000
ddd  63.754 19.687 16.559
eee  64.627 35.373  0.000
fff 100.000  0.000  0.000
ggg  40.382 47.300 12.318
hhh  56.642 43.358  0.000
iii  60.690 39.310  0.000
jjj  66.489  0.000 33.511
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading