I have the following dataset:
ID <- c(1,1,1,1,1,1,1,2,2,2,2,2)
color <- c("red","red","red","blue","green","green","blue",
"yellow","red","red","blue","green")
df <- data.frame(ID,color)
I wish to have: the "color" column to only contain the distinct colors.
So ID 1 has 7 observations with repeated colors but I want it to show just the distinct colors, so that ID 1 would have only 3 observations because it only has 3 distinct colors. etc
ID <- c(1,1,1,2,2,2,2)
n_color <- c(3,3,3,4,4,4,4)
color <- c("red","blue","green",
"yellow","red","blue","green")
df <- data.frame(ID,n_color,color)
I know I can use the following to summarize the distinct number of colors but I couldn’t figure out how to do what I wanted( mentioned above).
df%>%
group_by(ID)%>%
summarize(n=n_distinct(color))%>%
ungroup()
Is there a way to do this? I would appreciate all the help there is! Thanks!
>Solution :
Using distinct
library(dplyr)
df %>%
group_by(ID) %>%
distinct(color, .keep_all = T) %>%
mutate(n_color = n(), .after = ID) %>%
ungroup()
# A tibble: 7 × 3
ID n_color color
<dbl> <int> <chr>
1 1 3 red
2 1 3 blue
3 1 3 green
4 2 4 yellow
5 2 4 red
6 2 4 blue
7 2 4 green