Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to filter a list of dataframes based on a unique count of categorical factors in each dataframe?

I have a dataframe that I split into a list of dataframes based on a categorical variable in the dataframe:

list <- split(mpg, mpg$manufacturer)

I want to filter the list to only include dataframes where one of the categorical columns in each dataframe contain at least 5 unique factors, and remove those with less than 5.
I have tried lapply and filter over the dataset, but the result is filtering each dataframe, not the list entirely, as well as:
filteredlist <- lapply(list, function(x) length(unique(x$class) >= 5))
and am stumped.

Thanks, Any help would be appreciated!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

First let’s take a look at how many unique classes there are:

sapply(list, \(x) length(unique(x$class)))
   #    audi  chevrolet      dodge       ford      honda    hyundai       jeep land rover    lincoln 
   #       2          3          3          3          1          2          1          1          1 
   # mercury     nissan    pontiac     subaru     toyota volkswagen 
   #       1          3          1          3          4          3 

So, with this data, the >= 5 isn’t a great example because it will have 0 results. Let’s do >= 3 so we can expect a non-empty result.

## with Filter
filteredlist <- Filter(list, f = function(x) length(unique(x$class)) >= 3)
length(filteredlist)
# [1] 7

## or with sapply and `[`
sapply_filter = list[sapply(list, \(x) length(unique(x$class))) >= 3]
length(sapply_filter)
# [1] 7

Note that in your attempt lapply(list, function(x) length(unique(x$class) >= 5)) you have a parentheses typo, you want length(unique()) >= 5) not length(unique(...) >= 5))

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading