Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using tidyr::pivot_wider when some keys have multiple values

I have a long data frame that i want to widen, but one key has two different values:

df <- data.frame(ColA=c("A", "B", "B", "C"), ColB=letters[23:26])

  ColA ColB
1    A    w
2    B    x
3    B    y
4    C    z

I want my output to be a paste of the two values for this key together:

  ColA ColB
1    A    w
2    B    xy
3    C    z

A regular pivot_wider() will throw a warning and convert the values to lists:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.wide <- df %>%
    pivot_wider(names_from=ColA, values_from=ColB)

Warning message:
Values are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = length` to identify where the duplicates arise
* Use `values_fn = {summary_fun}` to summarise duplicates 

# A tibble: 1 x 3
  A         B         C        
  <list>    <list>    <list>   
1 <chr [1]> <chr [2]> <chr [1]>

Based on the warning it looks like pivot_wider() with a value_fn() is similar to what I want as an intermediate step:

# intermediate step
df.wide <- df %>%
    pivot_wider(names_from=ColA, values_from=ColB, values_fn=SOMETHING)

   A  B  C
1  w  xy z

But it looks like values_fn() only takes summary functions, and not something that would work on character data (like paste())

The closest I can get is:

df %>%
  pivot_wider(names_from=ColA, values_from=ColB, values_fn=list) %>% 
  mutate(across(everything(), as.character)) %>%
  pivot_longer(cols=everything(), names_to="ColA", values_to="ColB")

# A tibble: 3 x 2
  ColA  ColB             
  <chr> <chr>            
1 A     "w"              
2 B     "c(\"x\", \"y\")"
3 C     "z" 

With an additional mutating gsub()-type function. Surely there’s an easier way! Preferably within the tidyverse, but also open to other packages.

Thanks

>Solution :

I don’t think you need to pivot here, unless your real data is more complicated than the example shown.

library(dplyr)

df %>% 
  group_by(ColA) %>% 
  summarise(ColB = paste0(ColB, collapse = ""))

Result:

# A tibble: 3 × 2
  ColA  ColB 
  <chr> <chr>
1 A     w    
2 B     xy   
3 C     z  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading