Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find index with close elements in a specific order

I have a quite specific problem that I haven’t been able to figure out. I have the following data frame

df=structure(list(`1` = c(1, 0.980804939576247, 0.972001297465136, 
0.951775369398176, 0.905954756602819, 0.869053717987925, 0.843688917703845, 
0.799322227399393, 0.770010757762774, 0.717895307194166, 0.712648001576544
), `2` = c(0.980804939576247, 1, 0.99286934359771, 0.9780399371819, 
0.941290827027173, 0.902739825763346, 0.876213994786973, 0.831833910247186, 
0.786187344365065, 0.731092012418539, 0.732455285949785), `3` = c(0.972001297465136, 
0.99286934359771, 1, 0.9887897871777, 0.961069475772382, 0.92918675685132, 
0.903192705982216, 0.863032195414035, 0.820090444886175, 0.770215571188602, 
0.773443501596164), `4` = c(0.951775369398176, 0.9780399371819, 
0.9887897871777, 1, 0.981635495343049, 0.962754871356052, 0.941856408218425, 
0.905436805112006, 0.865215209390991, 0.815514765839081, 0.816238416736926
), `5` = c(0.905954756602819, 0.941290827027173, 0.961069475772382, 
0.981635495343049, 1, 0.986502994052612, 0.96303136666527, 0.930702553832032, 
0.890077164568825, 0.84619540384738, 0.850458309930501), `6` = c(0.869053717987925, 
0.902739825763346, 0.92918675685132, 0.962754871356052, 0.986502994052612, 
1, 0.991664811336722, 0.964662978037505, 0.929693736668219, 0.888837183872409, 
0.889164192629321), `7` = c(0.843688917703845, 0.876213994786973, 
0.903192705982216, 0.941856408218425, 0.96303136666527, 0.991664811336722, 
1, 0.982618079584971, 0.948031248412296, 0.911910748833129, 0.905557686967705
), `8` = c(0.799322227399393, 0.831833910247186, 0.863032195414035, 
0.905436805112006, 0.930702553832032, 0.964662978037505, 0.982618079584971, 
1, 0.978774807399762, 0.960378091397436, 0.95238261682306), `9` = c(0.770010757762774, 
0.786187344365065, 0.820090444886175, 0.865215209390991, 0.890077164568825, 
0.929693736668219, 0.948031248412296, 0.978774807399762, 1, 0.993448147603104, 
0.988079442756139), `10` = c(0.717895307194166, 0.731092012418539, 
0.770215571188602, 0.815514765839081, 0.84619540384738, 0.888837183872409, 
0.911910748833129, 0.960378091397436, 0.993448147603104, 1, 0.995198123043832
), `11` = c(0.712648001576544, 0.732455285949785, 0.773443501596164, 
0.816238416736926, 0.850458309930501, 0.889164192629321, 0.905557686967705, 
0.95238261682306, 0.988079442756139, 0.995198123043832, 1)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))

I want to group columns in a specific way. Starting from column 1, I want the index of columns in row 1 which are larger than 0.95, which gives me columns 1:4.

Then, starting from column 5 and row 5, I want to know which of the subsequent columns are again larger than 0.95, which gives columns 5:7.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

And so on.

The final result would be:

c1=c(1:4)
c2=c(5:7)
c3=c(8:11)

I am having trouble doing this with if statement, besides it is very inefficient. Is there a faster way to find this result?

>Solution :

I think you want to sequentially trim down (a copy of) df according to the matched rows, leaving a square at the bottom right of the data frame, until there are no more rows greater than 0.95 left:

df2 <- df
results <- list()

repeat {
  if(!any(df2[1,] > 0.95)) break
  indices <- which(df2[1, ] > 0.95) 
  answer <- indices + length(df) - length(df2)
  df2 <- df2[-indices, -indices]
  results[[length(results) + 1]] <- answer
}

results
#> [[1]]
#> [1] 1 2 3 4
#> 
#> [[2]]
#> [1] 5 6 7
#> 
#> [[3]]
#> [1]  8  9 10 11

This makes it easy to retrieve the matched sub-matrices too:

lapply(results, function(x) df[x, x])
#> [[1]]
#>           1         2         3         4
#> 1 1.0000000 0.9808049 0.9720013 0.9517754
#> 2 0.9808049 1.0000000 0.9928693 0.9780399
#> 3 0.9720013 0.9928693 1.0000000 0.9887898
#> 4 0.9517754 0.9780399 0.9887898 1.0000000
#> 
#> [[2]]
#>           5         6         7
#> 5 1.0000000 0.9865030 0.9630314
#> 6 0.9865030 1.0000000 0.9916648
#> 7 0.9630314 0.9916648 1.0000000
#> 
#> [[3]]
#>            8         9        10        11
#> 8  1.0000000 0.9787748 0.9603781 0.9523826
#> 9  0.9787748 1.0000000 0.9934481 0.9880794
#> 10 0.9603781 0.9934481 1.0000000 0.9951981
#> 11 0.9523826 0.9880794 0.9951981 1.0000000
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading