Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Aggregate rows with different order of naming

Let’s say I have a table as follows:

df <- data.frame("interaction" = c("A_B","A_C","D_B","B_A", "C_A"), 
                 "value" = c(10,25,6,1,4))

  interaction value
1         A_B    10
2         A_C    25
3         D_B     6
4         B_A     1
5         C_A     4

I would like to add values that are referring to the same thing (for instance A_B and B_A are the same columns essentially). The results that I want would like so:

 interaction value
1         A_B    11
2         A_C    29
3         D_B     6

Any help is appreciated. Thank you!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Here is one way using dplyr and tidyr library. Separate the interaction column in two separate columns by splitting on "_", sort them and get the sum of value for each group.

You may then combine the two sorted columns again using unite.

library(dplyr)
library(tidyr)

df %>%
  separate(interaction, c('col1', 'col2'), sep = '_') %>%
  group_by(group1 = pmin(col1, col2), group2 = pmax(col1, col2)) %>%
  summarise(value = sum(value), .groups = "drop") %>%
  unite(interaction, group1, group2)
  

# interaction value
#  <chr>       <dbl>
#1 A_B            10
#2 A_C            29
#3 B_D             6
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading