Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pivot_longer() with parallel (unlinked) sets of columns

I’m trying to use pivot_longer() to rearrange a dataset I was given, which looks like the result of a database join operation. Here’s an example of what it looks like:

dat <- tibble('Plant_Name'=c('tree','grass','bush','moss','weed','algae'),
       'Animal_Name'=c('duck',NA,'horse',NA,NA,'duck'),
       'Plant_Info'=c('tall','short','bushy','fuzzy','bad',NA),
       'Animal_Info'=c('wet',NA,'fast',NA,NA,'wet'))

# # A tibble: 6 × 4
# Plant_Name Animal_Name Plant_Info Animal_Info
# <chr>      <chr>       <chr>      <chr>      
# 1 tree       duck        tall       wet        
# 2 grass      NA          short      NA         
# 3 bush       horse       bushy      fast       
# 4 moss       NA          fuzzy      NA         
# 5 weed       NA          bad        NA         
# 6 algae      duck        NA         wet     

These data are not tidy, and are essentially just two other dataframes ("Plant" and "Animal") mashed together. I’m interested in "unjoining" the two separate datasets, and I’d like it to look like this:

# # A tibble: 7 × 3
# Class Name  Info 
# <chr>   <chr> <chr>
# 1 Animal  duck  wet  
# 2 Animal  horse fast 
# 3 Plant   bush  bushy
# 4 Plant   grass short
# 5 Plant   moss  fuzzy
# 6 Plant   tree  tall 
# 7 Plant   weed  bad 

I’m essentially wanting to do a pivot_wider() operation on two sets of columns that are not linked to each other. I could do this by creating two separate dataframes from each pair of columns, pivoting, then using bind_rows() to rejoin them, but I think there’s a way to do this in one fell swoop.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Here’s the solution I’ve come up with. Problem is, using unite isn’t really scalable:

dat %>% 
  #This part is clunky: how to apply this to N pairs?
  unite('Plant',contains('Plant')) %>% unite('Animal',contains('Animal')) %>% 
  pivot_longer(everything(),names_to = 'Class') %>% 
  separate_wider_delim(cols = value,delim = '_',names=c('Name','Info')) %>%
  mutate(across(c(Name,Info),~ifelse(.x=='NA',NA,.x))) %>% #Change character "NA" to NA
  na.omit() %>% #Get rid of missing rows
  arrange(Class,Name) %>% distinct()

Is there a proper way to do this in pivot_wider() that I’m not aware of?

Edit: this superficially question is similar to other "pivot multiple column" questions (such as this one, this one, and this one, as well as examples from Data Wrangling: Advanced Pivoting), but these questions deal with independent pairs of columns.

>Solution :

dat |>
  pivot_longer(everything(), names_to = c("Class", ".value"), 
               names_sep = "_", cols_vary = "slowest")



 Class  Name  Info 
   <chr>  <chr> <chr>
 1 Plant  tree  tall 
 2 Plant  grass short
 3 Plant  bush  bushy
 4 Plant  moss  fuzzy
 5 Plant  weed  bad  
 6 Plant  algae NA   
 7 Animal duck  wet  
 8 Animal NA    NA   
 9 Animal horse fast 
10 Animal NA    NA   
11 Animal NA    NA   
12 Animal duck  wet 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading