Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I exclude repeated observations while transforming data with pivot_wider?

I am currently working on a data project and need to condense the data so that the names in the data frame do not repeat. The issue is that some lines of data are repeated. Below I have attached a sample data frame:

df_have <- data.frame(
  Name = c("Maya", "Maya", "Maya," "Sierra", "Sophia", "Sophia", "Sophia", 
           "Sophia", "Cecilia", "Cecilia"),
  ID = c(24, 56, 24, 54, 12, 12, 15, 24, 12, 11)
)

And here is the desired data frame:

df_want <- data.frame(
  Name = c("Maya", "Sierra", "Sophia", "Cecilia"),
  ID1 = c(24, 54, 12, 12),
  ID2 = c(56, 0, 15, 11),
  ID3 = c(0, 0, 24, 0)
)

I previously posted a question very similar to this. From that, the transformation I am currently performing on the data is as follows:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ids |>
  mutate(idno = row_number(), .by = Name) |>
  pivot_wider(
    values_from = ID,
    names_from = idno,
    values_fill = 0,
    names_prefix = "ID"
  )

However, this does not exclude repeated values. I am using R to transform the data. The only command within

pivot_wider 

that I am familiar with regarding duplicates only applies to the column names and not the entries themselves. Aditionally, I have tried the

duplicated 

command, but this removed all duplicates and not just the desired ones. Thank you in advance for your help.

>Solution :

Your comment about how using distinct is distortive, doesnt obviously make sense, given it results in exactly the data that you declared that you wanted.
Is this a case of your example not being fully representative of your problem ?

df_have <- data.frame(
  Name = c("Maya", "Maya", "Maya", "Sierra", "Sophia", "Sophia", "Sophia", 
           "Sophia", "Cecilia", "Cecilia"),
  ID = c(24, 56, 24, 54, 12, 12, 15, 24, 12, 11)
)
# And here is the desired data frame:
  
  (df_want <- tibble(
    Name = c("Maya", "Sierra", "Sophia", "Cecilia"),
    ID1 = c(24, 54, 12, 12),
    ID2 = c(56, 0, 15, 11),
    ID3 = c(0, 0, 24, 0)
  ))
# I previously posted a question very similar to this. From that, the transformation I am currently performing on the data is as follows:
  
  df_result <- df_have |> distinct() |> 
  mutate(idno = row_number(), .by = Name) |>
  pivot_wider(
    values_from = ID,
    names_from = idno,
    values_fill = 0,
    names_prefix = "ID"
  )
  
  identical(df_want,df_result)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading