Aggregate rows with different order of naming

March 8, 2023

Let’s say I have a table as follows:

df <- data.frame("interaction" = c("A_B","A_C","D_B","B_A", "C_A"), 
                 "value" = c(10,25,6,1,4))

  interaction value
1         A_B    10
2         A_C    25
3         D_B     6
4         B_A     1
5         C_A     4

I would like to add values that are referring to the same thing (for instance A_B and B_A are the same columns essentially). The results that I want would like so:

 interaction value
1         A_B    11
2         A_C    29
3         D_B     6

Any help is appreciated. Thank you!

>Solution :

Here is one way using dplyr and tidyr library. Separate the interaction column in two separate columns by splitting on "_", sort them and get the sum of value for each group.

You may then combine the two sorted columns again using unite.

library(dplyr)
library(tidyr)

df %>%
  separate(interaction, c('col1', 'col2'), sep = '_') %>%
  group_by(group1 = pmin(col1, col2), group2 = pmax(col1, col2)) %>%
  summarise(value = sum(value), .groups = "drop") %>%
  unite(interaction, group1, group2)
  

# interaction value
#  <chr>       <dbl>
#1 A_B            10
#2 A_C            29
#3 B_D             6