Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Update records of dataframe where row names equal to column names

Hello I have a large dataframe where I have need to update or fill in the missing values. The condition is that the correct value should be in respect to row name matching to a column name. For example:

df <- data.frame(
  
  ID = c("x", "y", "z"), 
  x = c("1", "0.45", "0.47"),
  y = c("0.45", "1", "0.65"),
  z = c("XXXX", "XXXX", "1")
)

  ID    x    y    z
1  x    1 0.45 XXXX
2  y 0.45    1 XXXX
3  z 0.47 0.65    1

Above, the correct values for xxxx should be 0.47 and 0.65, respectively. Because x col z row has 0.47. Then y col z row is 0.65.The dataframe then looks like this:

  ID    x    y    z
1  x    1 0.45 0.47
2  y 0.45    1 0.65
3  z 0.47 0.65    1

This way all the elements row wise or column wise would correspond to matching rowname and column name.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have looked into these posts but I couldn’t use them to advance my solutions:

  1. Match Column to Column Names, add value to row/column of Matches

  2. data frame set value based on matching specific row name to column name

  3. Change values of matrix where row names equal column names

>Solution :

We could do this in base R

i1 <- upper.tri(df[-1])
i2 <- lower.tri(df[-1])
 df[-1][i1] <- df[-1][i2]

-output

> df
  ID    x    y    z
1  x    1 0.45 0.47
2  y 0.45    1 0.65
3  z 0.47 0.65    1

Or use coalesce

library(dplyr)
df[-1][upper.tri(df[-1])] <- coalesce(na_if(df[-1][upper.tri(df[-1])], 
        "XXXX"), df[-1][lower.tri(df[-1])])

Or may convert to ‘long’ format and then reshape back to wide format after filling

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
  mutate(across(everything(),  ~  na_if(.x, "XXXX"))) %>% 
  pivot_longer(cols = -ID) %>%
  group_by(ID2 = str_c(pmin(ID, name), pmax(ID, name))) %>% 
  fill(value, .direction = "downup") %>% 
  ungroup %>% 
  select(-ID2) %>% 
  pivot_wider(names_from = name, values_from = value)

-output

# A tibble: 3 × 4
  ID    x     y     z    
  <chr> <chr> <chr> <chr>
1 x     1     0.45  0.47 
2 y     0.45  1     0.65 
3 z     0.47  0.65  1    
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading