Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to fill a column by group with sampled row numbers according to n per group

I am working with a dataframe in R. I have groups stated by column Group1. I need to create a new column named sampled where I need to fill with a specific value after using sample per group from 1 to each number of rows per group. Here is the data I have:

library(tidyverse)
#Data
dat <- data.frame(Group1=sample(letters[1:3],15,replace = T))

Then dat looks like this:

dat
   Group1
1       b
2       a
3       a
4       c
5       c
6       c
7       a
8       b
9       c
10      b
11      a
12      b
13      c
14      c
15      c

In order to get the N per group, we do this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

#Code
dat %>% 
  arrange(Group1) %>%
  group_by(Group1) %>%
  mutate(N=n())

Which produces:

# A tibble: 15 x 2
# Groups:   Group1 [3]
   Group1     N
   <chr>  <int>
 1 a          4
 2 a          4
 3 a          4
 4 a          4
 5 b          4
 6 b          4
 7 b          4
 8 b          4
 9 c          7
10 c          7
11 c          7
12 c          7
13 c          7
14 c          7
15 c          7

What I need to do is next. I have the N per group, so I have to create a sample of 3 numbers from 1:N. In the case of group a having N=4 it would be sample(1:4,3) which produces [1] 2 4 3. With this in the group a I need that rows belonging to sampled values must be filled with 999. So for first group we would have:

   Group1     N sampled
   <chr>  <int>   <int>
 1 a          4    NA
 2 a          4    999
 3 a          4    999
 4 a          4    999

And then the same for the rest of groups. In this way using sample we will have random values per group. Is that possible to do using dplyr or tidyverse. Many thanks!

>Solution :

You could try:

set.seed(3242)

library(dplyr)

dat %>%
  arrange(Group1) %>%
  add_count(Group1, name = 'N') %>%
  group_by(Group1) %>%
  mutate(
    sampled = case_when(
      row_number() %in% sample(1:n(), 3L) ~ 999L,
      TRUE ~ NA_integer_
    )
  )

Output:

# A tibble: 15 × 3
# Groups:   Group1 [3]
   Group1     N sampled
   <chr>  <int>   <int>
 1 a          4     999
 2 a          4     999
 3 a          4      NA
 4 a          4     999
 5 b          4     999
 6 b          4     999
 7 b          4     999
 8 b          4      NA
 9 c          7      NA
10 c          7     999
11 c          7      NA
12 c          7     999
13 c          7      NA
14 c          7      NA
15 c          7     999
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading