Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

using group_by in dplyr to merge two character column into one

I have a data frame like the case below, and I would like to use group_by function of the dplyr in a way that Gender and Income merged into one column.

df1 <- tribble(
  ~Country, ~Gender, ~var1, ~var2, ~ var3, ~Income,
  "Bangladesh", "F", 2.5, 3, 1.5, "LM",
  "Bangladesh", "M", 4.5, 4.3, 2.7, "LM",
  "Laos", "F", 2.7, 3.2, 6.5, "LM", 
  "Laos", "M", 3.5, 5.1, 8.2, "LM", 
  "Ghana", "F", 8.5, 5, 7.5, "LM",
  "Ghana", "M", 4, 6.7, 1.3, "LM",
  "China", "F", 4.3, 6.1, 2.5, "UM",
  "China", "M", 6.2, 2.8, 6.8, "UM",
)

I have the option to use group_by to join two numeric clolumn like this:

df1 %>% 
  group_by(Country, subgroup = var1 + var2) %>%
  summarise()

But I cannot do that with character clolumn:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df1 %>% 
  group_by(Country, subgroup = Gender + Income) %>%
  summarise()

#Error: ! non-numeric argument to binary operator

what I would like after groupuing by is something like below

df2 <- tribble(
  ~Country, ~subgroup, 
  "Bangladesh", "F", 
  "Bangladesh", "M", 
  "Laos", "F",  
  "Laos", "M", 
  "Ghana", "F", 
  "Ghana", "M", 
  "China", "F", 
  "China", "M",
  "Bangladesh", "LM", 
  "Bangladesh", "LM", 
  "Laos", "LM",  
  "Laos", "LM", 
  "Ghana", "LM", 
  "Ghana", "LM", 
  "China", "UM", 
  "China", "UM",
)

>Solution :

Your desired output isn’t exactly summing the two columns, but rather transforming it from "wide" to "long". You can get your desired output in base R using mapply with c to combine them or using tidyr::pivot_longer() (more popular):

Base R:

mapply(c, 
       df1[c("Country", "Gender")],
       df1[c("Country", "Income")])

      Country      Gender
 [1,] "Bangladesh" "F"   
 [2,] "Bangladesh" "M"   
 [3,] "Laos"       "F"   
 [4,] "Laos"       "M"   
 [5,] "Ghana"      "F"   
 [6,] "Ghana"      "M"   
 [7,] "China"      "F"   
 [8,] "China"      "M"   
 [9,] "Bangladesh" "LM"  
[10,] "Bangladesh" "LM"  
[11,] "Laos"       "LM"  
[12,] "Laos"       "LM"  
[13,] "Ghana"      "LM"  
[14,] "Ghana"      "LM"  
[15,] "China"      "UM"  
[16,] "China"      "UM"  

dplyr/tidyr

library(dplyr)
library(tidyr)
df1 %>%
  pivot_longer(df1, c(Gender, Income), values_to = "subgroup") %>%
  select(Country, subgroup)

Output (organized by country)

   Country    subgroup
   <chr>      <chr>   
 1 Bangladesh F       
 2 Bangladesh LM      
 3 Bangladesh M       
 4 Bangladesh LM      
 5 Laos       F       
 6 Laos       LM      
 7 Laos       M       
 8 Laos       LM      
 9 Ghana      F       
10 Ghana      LM      
11 Ghana      M       
12 Ghana      LM      
13 China      F       
14 China      UM      
15 China      M       
16 China      UM  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading