Add empty rows at specific positions of dataframe

November 23, 2021

I want to add empty rows at specific positions of a dataframe. Let’s say we have this dataframe:

df <- data.frame(var1 = c(1,2,3,4,5,6,7,8,9),
     var2 = c(9,8,7,6,5,4,3,2,1))

In which I want to add an empty row after rows 1, 3 and 5 (I know that this is not best practice in most cases, ultimately I want to create a table using flextable here). These row numbers are saved in a vector:

rows <- c(1,3,5)

Now I want to use a for loop that loops through the rows vector to add an empty row after each row using add_row():

for (i in rows) {
df <- add_row(df, .after = i)
}

The problem is, that while the first iteration works flawlessly, the other empty rows get misplaced, since the dataframe gets obviously longer. To fix this I tried adding 1 to the vector after each iteration:

for (i in rows) {
df <- add_row(df, .after = i)
rows <- rows+1
}

Which does not work. I assume the rows vector does only get evaluated once. Anyone got any ideas?

>Solution :

Do it all at once, no need for looping. Make a sequence of row numbers, add the new rows in, sort, then replace the duplicated row numbers with NA:

s <- sort(c(seq_len(nrow(df)), rows))
out <- df[s,]
out[duplicated(s),] <- NA

#    var1 var2
#1      1    9
#1.1   NA   NA
#2      2    8
#3      3    7
#3.1   NA   NA
#4      4    6
#5      5    5
#5.1   NA   NA
#6      6    4
#7      7    3
#8      8    2
#9      9    1

This will be much more efficient than looping, for even moderately sized data:

df <- df[rep(1:9,1e4),]
rows <- seq(1,9e4,100)

system.time({
for (i in rev(rows)) {
  df <- tibble::add_row(df, .after = i)
}
})
#   user  system elapsed 
#  23.94    0.05   23.98 

df <- df[rep(1:9,1e4),]
rows <- seq(1,9e4,100)

system.time({
s <- sort(c(seq_len(nrow(df)), rows))
out <- df[s,]
out[duplicated(s),] <- NA
})
#   user  system elapsed 
#   0.00    0.01    0.02