Create new column with distinct character values

April 11, 2023

I have the following dataset:

ID <- c(1,1,1,1,1,1,1,2,2,2,2,2)
color <- c("red","red","red","blue","green","green","blue",
           "yellow","red","red","blue","green")
df <- data.frame(ID,color)

I wish to have: the "color" column to only contain the distinct colors.
So ID 1 has 7 observations with repeated colors but I want it to show just the distinct colors, so that ID 1 would have only 3 observations because it only has 3 distinct colors. etc

ID <- c(1,1,1,2,2,2,2)
n_color <- c(3,3,3,4,4,4,4)
color <- c("red","blue","green",
           "yellow","red","blue","green")
df <- data.frame(ID,n_color,color)

I know I can use the following to summarize the distinct number of colors but I couldn’t figure out how to do what I wanted( mentioned above).

df%>%
 group_by(ID)%>%
 summarize(n=n_distinct(color))%>%
 ungroup()

Is there a way to do this? I would appreciate all the help there is! Thanks!

>Solution :

Using distinct

library(dplyr)

df %>% 
  group_by(ID) %>% 
  distinct(color, .keep_all = T) %>% 
  mutate(n_color = n(), .after = ID) %>% 
  ungroup()
# A tibble: 7 × 3
     ID n_color color 
  <dbl>   <int> <chr> 
1     1       3 red   
2     1       3 blue  
3     1       3 green 
4     2       4 yellow
5     2       4 red   
6     2       4 blue  
7     2       4 green