using group_by in dplyr to merge two character column into one

March 29, 2023

I have a data frame like the case below, and I would like to use group_by function of the dplyr in a way that Gender and Income merged into one column.

df1 <- tribble(
  ~Country, ~Gender, ~var1, ~var2, ~ var3, ~Income,
  "Bangladesh", "F", 2.5, 3, 1.5, "LM",
  "Bangladesh", "M", 4.5, 4.3, 2.7, "LM",
  "Laos", "F", 2.7, 3.2, 6.5, "LM", 
  "Laos", "M", 3.5, 5.1, 8.2, "LM", 
  "Ghana", "F", 8.5, 5, 7.5, "LM",
  "Ghana", "M", 4, 6.7, 1.3, "LM",
  "China", "F", 4.3, 6.1, 2.5, "UM",
  "China", "M", 6.2, 2.8, 6.8, "UM",
)

I have the option to use group_by to join two numeric clolumn like this:

df1 %>% 
  group_by(Country, subgroup = var1 + var2) %>%
  summarise()

But I cannot do that with character clolumn:

df1 %>% 
  group_by(Country, subgroup = Gender + Income) %>%
  summarise()

#Error: ! non-numeric argument to binary operator

what I would like after groupuing by is something like below

df2 <- tribble(
  ~Country, ~subgroup, 
  "Bangladesh", "F", 
  "Bangladesh", "M", 
  "Laos", "F",  
  "Laos", "M", 
  "Ghana", "F", 
  "Ghana", "M", 
  "China", "F", 
  "China", "M",
  "Bangladesh", "LM", 
  "Bangladesh", "LM", 
  "Laos", "LM",  
  "Laos", "LM", 
  "Ghana", "LM", 
  "Ghana", "LM", 
  "China", "UM", 
  "China", "UM",
)

>Solution :

Your desired output isn’t exactly summing the two columns, but rather transforming it from "wide" to "long". You can get your desired output in base R using mapply with c to combine them or using tidyr::pivot_longer() (more popular):

Base R:

mapply(c, 
       df1[c("Country", "Gender")],
       df1[c("Country", "Income")])

      Country      Gender
 [1,] "Bangladesh" "F"   
 [2,] "Bangladesh" "M"   
 [3,] "Laos"       "F"   
 [4,] "Laos"       "M"   
 [5,] "Ghana"      "F"   
 [6,] "Ghana"      "M"   
 [7,] "China"      "F"   
 [8,] "China"      "M"   
 [9,] "Bangladesh" "LM"  
[10,] "Bangladesh" "LM"  
[11,] "Laos"       "LM"  
[12,] "Laos"       "LM"  
[13,] "Ghana"      "LM"  
[14,] "Ghana"      "LM"  
[15,] "China"      "UM"  
[16,] "China"      "UM"

dplyr/tidyr

library(dplyr)
library(tidyr)
df1 %>%
  pivot_longer(df1, c(Gender, Income), values_to = "subgroup") %>%
  select(Country, subgroup)

Output (organized by country)

   Country    subgroup
   <chr>      <chr>   
 1 Bangladesh F       
 2 Bangladesh LM      
 3 Bangladesh M       
 4 Bangladesh LM      
 5 Laos       F       
 6 Laos       LM      
 7 Laos       M       
 8 Laos       LM      
 9 Ghana      F       
10 Ghana      LM      
11 Ghana      M       
12 Ghana      LM      
13 China      F       
14 China      UM      
15 China      M       
16 China      UM