Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to add rows so that each group has equal number of rows?

I have a data frame with unequal numbers of rows per group, see df in the example below. I would like to add rows containing the group name and NAs in all other columns so that there is an equal number of rows per group like in df.desired. The rows should be added after the last row from the respective group.

Example:

df = data.frame(group = c("A","A","A","A","B","B","B","C","C"),  
                         col1 = c(1, 1, 1, 1, 2, 2, 2, 3, 3),
                         col2 = c(12, 13, 14, 15, 21, 22, 23, 31, 32))
> df
  group col1 col2
1     A    1   12
2     A    1   13
3     A    1   14
4     A    1   15
5     B    2   21
6     B    2   22
7     B    2   23
8     C    3   31
9     C    3   32
df.desired = data.frame(group = c("A","A","A","A","B","B","B","B","C","C","C","C"),  
                         col1 = c(1, 1, 1, 1, 2, 2, 2, NA, 3, 3, NA, NA),
                         col2 = c(12, 13, 14, 15, 21, 22, 23, NA, 31, 32, NA, NA))
> df.desired
   group col1 col2
1      A    1   12
2      A    1   13
3      A    1   14
4      A    1   15
5      B    2   21
6      B    2   22
7      B    2   23
8      B   NA   NA
9      C    3   31
10     C    3   32
11     C   NA   NA
12     C   NA   NA

I know how to do this with a loop but that would be super slow and I would prefer to use dplyr if possible. Does anyone have any ideas?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

How about this:

library(dplyr)
df = data.frame(group = c("A","A","A","A","B","B","B","C","C"),  
               col1 = c(1, 1, 1, 1, 2, 2, 2, 3, 3),
               col2 = c(12, 13, 14, 15, 21, 22, 23, 31, 32))
maxgp <- max(table(df$group))

df %>% 
  group_by(group) %>% 
  summarise(across(everything(), ~c(.x, rep(NA, maxgp-n()))))
#> `summarise()` has grouped output by 'group'. You can override using the
#> `.groups` argument.
#> # A tibble: 12 × 3
#> # Groups:   group [3]
#>    group  col1  col2
#>    <chr> <dbl> <dbl>
#>  1 A         1    12
#>  2 A         1    13
#>  3 A         1    14
#>  4 A         1    15
#>  5 B         2    21
#>  6 B         2    22
#>  7 B         2    23
#>  8 B        NA    NA
#>  9 C         3    31
#> 10 C         3    32
#> 11 C        NA    NA
#> 12 C        NA    NA

Created on 2023-02-01 by the reprex package (v2.0.1)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading