Home How do I exclude repeated observations while transforming data with pivot_wider?

Questions

How do I exclude repeated observations while transforming data with pivot_wider?

January 26, 2024

I am currently working on a data project and need to condense the data so that the names in the data frame do not repeat. The issue is that some lines of data are repeated. Below I have attached a sample data frame:

df_have <- data.frame(
  Name = c("Maya", "Maya", "Maya," "Sierra", "Sophia", "Sophia", "Sophia", 
           "Sophia", "Cecilia", "Cecilia"),
  ID = c(24, 56, 24, 54, 12, 12, 15, 24, 12, 11)
)

And here is the desired data frame:

df_want <- data.frame(
  Name = c("Maya", "Sierra", "Sophia", "Cecilia"),
  ID1 = c(24, 54, 12, 12),
  ID2 = c(56, 0, 15, 11),
  ID3 = c(0, 0, 24, 0)
)

I previously posted a question very similar to this. From that, the transformation I am currently performing on the data is as follows:

ids |>
  mutate(idno = row_number(), .by = Name) |>
  pivot_wider(
    values_from = ID,
    names_from = idno,
    values_fill = 0,
    names_prefix = "ID"
  )

However, this does not exclude repeated values. I am using R to transform the data. The only command within

pivot_wider

that I am familiar with regarding duplicates only applies to the column names and not the entries themselves. Aditionally, I have tried the

duplicated

command, but this removed all duplicates and not just the desired ones. Thank you in advance for your help.

>Solution :

Your comment about how using distinct is distortive, doesnt obviously make sense, given it results in exactly the data that you declared that you wanted.
Is this a case of your example not being fully representative of your problem ?

df_have <- data.frame(
  Name = c("Maya", "Maya", "Maya", "Sierra", "Sophia", "Sophia", "Sophia", 
           "Sophia", "Cecilia", "Cecilia"),
  ID = c(24, 56, 24, 54, 12, 12, 15, 24, 12, 11)
)
# And here is the desired data frame:
  
  (df_want <- tibble(
    Name = c("Maya", "Sierra", "Sophia", "Cecilia"),
    ID1 = c(24, 54, 12, 12),
    ID2 = c(56, 0, 15, 11),
    ID3 = c(0, 0, 24, 0)
  ))
# I previously posted a question very similar to this. From that, the transformation I am currently performing on the data is as follows:
  
  df_result <- df_have |> distinct() |> 
  mutate(idno = row_number(), .by = Name) |>
  pivot_wider(
    values_from = ID,
    names_from = idno,
    values_fill = 0,
    names_prefix = "ID"
  )
  
  identical(df_want,df_result)