I have a data frame with unequal numbers of rows per group, see df in the example below. I would like to add rows containing the group name and NAs in all other columns so that there is an equal number of rows per group like in df.desired. The rows should be added after the last row from the respective group.
Example:
df = data.frame(group = c("A","A","A","A","B","B","B","C","C"),
col1 = c(1, 1, 1, 1, 2, 2, 2, 3, 3),
col2 = c(12, 13, 14, 15, 21, 22, 23, 31, 32))
> df
group col1 col2
1 A 1 12
2 A 1 13
3 A 1 14
4 A 1 15
5 B 2 21
6 B 2 22
7 B 2 23
8 C 3 31
9 C 3 32
df.desired = data.frame(group = c("A","A","A","A","B","B","B","B","C","C","C","C"),
col1 = c(1, 1, 1, 1, 2, 2, 2, NA, 3, 3, NA, NA),
col2 = c(12, 13, 14, 15, 21, 22, 23, NA, 31, 32, NA, NA))
> df.desired
group col1 col2
1 A 1 12
2 A 1 13
3 A 1 14
4 A 1 15
5 B 2 21
6 B 2 22
7 B 2 23
8 B NA NA
9 C 3 31
10 C 3 32
11 C NA NA
12 C NA NA
I know how to do this with a loop but that would be super slow and I would prefer to use dplyr if possible. Does anyone have any ideas?
>Solution :
How about this:
library(dplyr)
df = data.frame(group = c("A","A","A","A","B","B","B","C","C"),
col1 = c(1, 1, 1, 1, 2, 2, 2, 3, 3),
col2 = c(12, 13, 14, 15, 21, 22, 23, 31, 32))
maxgp <- max(table(df$group))
df %>%
group_by(group) %>%
summarise(across(everything(), ~c(.x, rep(NA, maxgp-n()))))
#> `summarise()` has grouped output by 'group'. You can override using the
#> `.groups` argument.
#> # A tibble: 12 × 3
#> # Groups: group [3]
#> group col1 col2
#> <chr> <dbl> <dbl>
#> 1 A 1 12
#> 2 A 1 13
#> 3 A 1 14
#> 4 A 1 15
#> 5 B 2 21
#> 6 B 2 22
#> 7 B 2 23
#> 8 B NA NA
#> 9 C 3 31
#> 10 C 3 32
#> 11 C NA NA
#> 12 C NA NA
Created on 2023-02-01 by the reprex package (v2.0.1)