Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Creating new column by combining two columns, but variables need to be in alphabetical order

I have the following dataset:

ID<-c("A","B","C","D","E")
Fruits1<-c("orange","apple","pineapple","apple","pineapple")
Fruits2<-c("apple","orange","apple","pineapple","orange")
data<-tibble(ID,Fruits1,Fruits2)
data
# A tibble: 5 × 3
ID    Fruits1   Fruits2  
<chr> <chr>     <chr>    
1 A     orange    apple    
2 B     apple     orange   
3 C     pineapple apple    
4 D     apple     pineapple
5 E     Pineapple orange  

I want to create a new column called FruitsDiff, which combines columns Fruits1 and Fruits2 like so:

# A tibble: 5 × 4
ID    Fruits1   Fruits2   FruitsDiff      
<chr> <chr>     <chr>     <chr>           
1 A     orange    apple     apple-orange    
2 B     apple     orange    apple-orange    
3 C     pineapple apple     apple-pineapple 
4 D     apple     pineapple apple-pineapple 
5 E     pineapple orange    orange-pineapple

My requirement for this new column is that the fruits be ordered alphabetically, regardless of which fruit comes first in the data frame (so for example, for line 1, even if orange comes before apple, the variable in FruitsDiff is apple-orange).
I usually use paste() to combine columns, but this is not going to help me in this situation.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Any suggestions? Also, the separator between the two variables can be anything, I just used a dash for the sake of the example.

>Solution :

We get the elementwise min/max between the ‘Fruit’ columns with pmin/pmax (where the min/max will be based on alphabetic order) and paste (str_c) the output from those functions to create the new column ‘FruitsDiff’

library(dplyr)
library(stringr)
data %>% 
 mutate(FruitsDiff = str_c(pmin(Fruits1, Fruits2), 
      pmax(Fruits1, Fruits2), sep = '-'))

-output

# A tibble: 5 × 4
  ID    Fruits1   Fruits2   FruitsDiff      
  <chr> <chr>     <chr>     <chr>           
1 A     orange    apple     apple-orange    
2 B     apple     orange    apple-orange    
3 C     pineapple apple     apple-pineapple 
4 D     apple     pineapple apple-pineapple 
5 E     pineapple orange    orange-pineapple

Or in base R, use the same pmin/pmax to paste

data$FruitsDiff <- with(data, paste(pmin(Fruits1, Fruits2), 
      pmax(Fruits1, Fruits2), sep = '-'))

Or with apply and MARGIN = 1, loop over the rows, sort the elements and paste with collapse as argument

data$FruitsDiff <-  apply(data[-1], 1, \(x) paste(sort(x), collapse = '-'))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading