Find index with close elements in a specific order

April 1, 2022

I have a quite specific problem that I haven’t been able to figure out. I have the following data frame

df=structure(list(`1` = c(1, 0.980804939576247, 0.972001297465136, 
0.951775369398176, 0.905954756602819, 0.869053717987925, 0.843688917703845, 
0.799322227399393, 0.770010757762774, 0.717895307194166, 0.712648001576544
), `2` = c(0.980804939576247, 1, 0.99286934359771, 0.9780399371819, 
0.941290827027173, 0.902739825763346, 0.876213994786973, 0.831833910247186, 
0.786187344365065, 0.731092012418539, 0.732455285949785), `3` = c(0.972001297465136, 
0.99286934359771, 1, 0.9887897871777, 0.961069475772382, 0.92918675685132, 
0.903192705982216, 0.863032195414035, 0.820090444886175, 0.770215571188602, 
0.773443501596164), `4` = c(0.951775369398176, 0.9780399371819, 
0.9887897871777, 1, 0.981635495343049, 0.962754871356052, 0.941856408218425, 
0.905436805112006, 0.865215209390991, 0.815514765839081, 0.816238416736926
), `5` = c(0.905954756602819, 0.941290827027173, 0.961069475772382, 
0.981635495343049, 1, 0.986502994052612, 0.96303136666527, 0.930702553832032, 
0.890077164568825, 0.84619540384738, 0.850458309930501), `6` = c(0.869053717987925, 
0.902739825763346, 0.92918675685132, 0.962754871356052, 0.986502994052612, 
1, 0.991664811336722, 0.964662978037505, 0.929693736668219, 0.888837183872409, 
0.889164192629321), `7` = c(0.843688917703845, 0.876213994786973, 
0.903192705982216, 0.941856408218425, 0.96303136666527, 0.991664811336722, 
1, 0.982618079584971, 0.948031248412296, 0.911910748833129, 0.905557686967705
), `8` = c(0.799322227399393, 0.831833910247186, 0.863032195414035, 
0.905436805112006, 0.930702553832032, 0.964662978037505, 0.982618079584971, 
1, 0.978774807399762, 0.960378091397436, 0.95238261682306), `9` = c(0.770010757762774, 
0.786187344365065, 0.820090444886175, 0.865215209390991, 0.890077164568825, 
0.929693736668219, 0.948031248412296, 0.978774807399762, 1, 0.993448147603104, 
0.988079442756139), `10` = c(0.717895307194166, 0.731092012418539, 
0.770215571188602, 0.815514765839081, 0.84619540384738, 0.888837183872409, 
0.911910748833129, 0.960378091397436, 0.993448147603104, 1, 0.995198123043832
), `11` = c(0.712648001576544, 0.732455285949785, 0.773443501596164, 
0.816238416736926, 0.850458309930501, 0.889164192629321, 0.905557686967705, 
0.95238261682306, 0.988079442756139, 0.995198123043832, 1)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))

I want to group columns in a specific way. Starting from column 1, I want the index of columns in row 1 which are larger than 0.95, which gives me columns 1:4.

Then, starting from column 5 and row 5, I want to know which of the subsequent columns are again larger than 0.95, which gives columns 5:7.

And so on.

The final result would be:

c1=c(1:4)
c2=c(5:7)
c3=c(8:11)

I am having trouble doing this with if statement, besides it is very inefficient. Is there a faster way to find this result?

>Solution :

I think you want to sequentially trim down (a copy of) df according to the matched rows, leaving a square at the bottom right of the data frame, until there are no more rows greater than 0.95 left:

df2 <- df
results <- list()

repeat {
  if(!any(df2[1,] > 0.95)) break
  indices <- which(df2[1, ] > 0.95) 
  answer <- indices + length(df) - length(df2)
  df2 <- df2[-indices, -indices]
  results[[length(results) + 1]] <- answer
}

results
#> [[1]]
#> [1] 1 2 3 4
#> 
#> [[2]]
#> [1] 5 6 7
#> 
#> [[3]]
#> [1]  8  9 10 11

This makes it easy to retrieve the matched sub-matrices too:

lapply(results, function(x) df[x, x])
#> [[1]]
#>           1         2         3         4
#> 1 1.0000000 0.9808049 0.9720013 0.9517754
#> 2 0.9808049 1.0000000 0.9928693 0.9780399
#> 3 0.9720013 0.9928693 1.0000000 0.9887898
#> 4 0.9517754 0.9780399 0.9887898 1.0000000
#> 
#> [[2]]
#>           5         6         7
#> 5 1.0000000 0.9865030 0.9630314
#> 6 0.9865030 1.0000000 0.9916648
#> 7 0.9630314 0.9916648 1.0000000
#> 
#> [[3]]
#>            8         9        10        11
#> 8  1.0000000 0.9787748 0.9603781 0.9523826
#> 9  0.9787748 1.0000000 0.9934481 0.9880794
#> 10 0.9603781 0.9934481 1.0000000 0.9951981
#> 11 0.9523826 0.9880794 0.9951981 1.0000000