I am currently working on a data project and need to condense the data so that the names in the data frame do not repeat. The issue is that some lines of data are repeated. Below I have attached a sample data frame:
df_have <- data.frame(
Name = c("Maya", "Maya", "Maya," "Sierra", "Sophia", "Sophia", "Sophia",
"Sophia", "Cecilia", "Cecilia"),
ID = c(24, 56, 24, 54, 12, 12, 15, 24, 12, 11)
)
And here is the desired data frame:
df_want <- data.frame(
Name = c("Maya", "Sierra", "Sophia", "Cecilia"),
ID1 = c(24, 54, 12, 12),
ID2 = c(56, 0, 15, 11),
ID3 = c(0, 0, 24, 0)
)
I previously posted a question very similar to this. From that, the transformation I am currently performing on the data is as follows:
ids |>
mutate(idno = row_number(), .by = Name) |>
pivot_wider(
values_from = ID,
names_from = idno,
values_fill = 0,
names_prefix = "ID"
)
However, this does not exclude repeated values. I am using R to transform the data. The only command within
pivot_wider
that I am familiar with regarding duplicates only applies to the column names and not the entries themselves. Aditionally, I have tried the
duplicated
command, but this removed all duplicates and not just the desired ones. Thank you in advance for your help.
>Solution :
Your comment about how using distinct is distortive, doesnt obviously make sense, given it results in exactly the data that you declared that you wanted.
Is this a case of your example not being fully representative of your problem ?
df_have <- data.frame(
Name = c("Maya", "Maya", "Maya", "Sierra", "Sophia", "Sophia", "Sophia",
"Sophia", "Cecilia", "Cecilia"),
ID = c(24, 56, 24, 54, 12, 12, 15, 24, 12, 11)
)
# And here is the desired data frame:
(df_want <- tibble(
Name = c("Maya", "Sierra", "Sophia", "Cecilia"),
ID1 = c(24, 54, 12, 12),
ID2 = c(56, 0, 15, 11),
ID3 = c(0, 0, 24, 0)
))
# I previously posted a question very similar to this. From that, the transformation I am currently performing on the data is as follows:
df_result <- df_have |> distinct() |>
mutate(idno = row_number(), .by = Name) |>
pivot_wider(
values_from = ID,
names_from = idno,
values_fill = 0,
names_prefix = "ID"
)
identical(df_want,df_result)