Dataframe with unique values of some columns

January 12, 2024

I would like to have a dataframe of variables with corresponding unique values (based on a threshold) of the original dataframe. In other words, if a column has less than 5 unique values then it should be added as a row in the new dataframe.
For example, based on the following dataframe

structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), gender = c("male", 
"female", "female", "female", "female", "male", "male", "female", 
"female", "female"), ranking = c("low", "medium", "medium", "medium", 
"high", "low", "medium", "low", "low", "low"), comments = c("I was really dissapointed by the fact that there was no response", 
"I got feedback from them but I considered it a lie", "The feedback was really good and I felt convinced", 
"I was informed they will get back to me", "The feedback was appropriate to me", 
"I feel the contact person wasn't knowledgeable about the product", 
"I was told they will follow up within a week but they failed to", 
"I liked their customer service", "I was told that the issue will soon be addressed", 
"I am satisfied with the resonse they gave")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -10L))

the desired result would be

enter image description here

>Solution :

You could do

library(tidyverse)

df %>% 
  select(which(sapply(df, \(x) length(unique(x)) < 5))) %>%
  summarise(across(everything(), ~ paste(unique(.x), collapse = '; '))) %>%
  {data.frame(column = names(.), unique_values = unlist(.), row.names = NULL)}
#>    column     unique_values
#> 1  gender      male; female
#> 2 ranking low; medium; high