Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

If the number of rows in a group exceeds X number of observations, randomly sample X number of rows

I need to reduce the number of rows in a data set. To do this my strategy is to the number of rows in a group exceeds X number of observations, randomly sample X number of rows from each group if the number of rows in a group exceeds X rows.

Assume the following data set:

set.seed(123)
n <- 10

df <- data.frame(id = c(1:n),
                 group = sample(1:3, n, replace = T))

> df
   id group
1   1     3
2   2     3
3   3     3
4   4     2
5   5     3
6   6     2
7   7     2
8   8     2
9   9     3
10 10     1

where X == 2. Let’s count the number of rows in each group.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

> table(df$group)

1 2 3 
1 4 5 

This means that in the end result, I want 1 observation in groups one, and 2 in groups 2 and 3. The row that is kept in groups 2 and 3 should be randomly selected. This would reduce the data’s size from 10 rows to 5.

How do I do this in an efficient way?

Thanks!

>Solution :

Here is one way to group by group column and create a condition in slice to check if the number of rows (n()) is greater than ‘X’, sample the sequence of rows (row_number()) with X or else return row_number() (or sample in case X is different value

library(dplyr)
X <- 2
df %>% 
  group_by(group) %>% 
  slice(if(n() >= X) sample(row_number(), X, replace = FALSE) else 
     sample(row_number())) %>%
  ungroup

-output

# A tibble: 5 × 2
     id group
  <int> <int>
1    10     1
2     8     2
3     4     2
4     1     3
5     9     3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading