I would like to have a dataframe of variables with corresponding unique values (based on a threshold) of the original dataframe. In other words, if a column has less than 5 unique values then it should be added as a row in the new dataframe.
For example, based on the following dataframe
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), gender = c("male",
"female", "female", "female", "female", "male", "male", "female",
"female", "female"), ranking = c("low", "medium", "medium", "medium",
"high", "low", "medium", "low", "low", "low"), comments = c("I was really dissapointed by the fact that there was no response",
"I got feedback from them but I considered it a lie", "The feedback was really good and I felt convinced",
"I was informed they will get back to me", "The feedback was appropriate to me",
"I feel the contact person wasn't knowledgeable about the product",
"I was told they will follow up within a week but they failed to",
"I liked their customer service", "I was told that the issue will soon be addressed",
"I am satisfied with the resonse they gave")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L))
the desired result would be
>Solution :
You could do
library(tidyverse)
df %>%
select(which(sapply(df, \(x) length(unique(x)) < 5))) %>%
summarise(across(everything(), ~ paste(unique(.x), collapse = '; '))) %>%
{data.frame(column = names(.), unique_values = unlist(.), row.names = NULL)}
#> column unique_values
#> 1 gender male; female
#> 2 ranking low; medium; high