Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Add empty rows at specific positions of dataframe

I want to add empty rows at specific positions of a dataframe. Let’s say we have this dataframe:

df <- data.frame(var1 = c(1,2,3,4,5,6,7,8,9),
     var2 = c(9,8,7,6,5,4,3,2,1))

In which I want to add an empty row after rows 1, 3 and 5 (I know that this is not best practice in most cases, ultimately I want to create a table using flextable here). These row numbers are saved in a vector:

rows <- c(1,3,5)

Now I want to use a for loop that loops through the rows vector to add an empty row after each row using add_row():

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

for (i in rows) {
df <- add_row(df, .after = i)
}

The problem is, that while the first iteration works flawlessly, the other empty rows get misplaced, since the dataframe gets obviously longer. To fix this I tried adding 1 to the vector after each iteration:

for (i in rows) {
df <- add_row(df, .after = i)
rows <- rows+1
}

Which does not work. I assume the rows vector does only get evaluated once. Anyone got any ideas?

>Solution :

Do it all at once, no need for looping. Make a sequence of row numbers, add the new rows in, sort, then replace the duplicated row numbers with NA:

s <- sort(c(seq_len(nrow(df)), rows))
out <- df[s,]
out[duplicated(s),] <- NA

#    var1 var2
#1      1    9
#1.1   NA   NA
#2      2    8
#3      3    7
#3.1   NA   NA
#4      4    6
#5      5    5
#5.1   NA   NA
#6      6    4
#7      7    3
#8      8    2
#9      9    1

This will be much more efficient than looping, for even moderately sized data:

df <- df[rep(1:9,1e4),]
rows <- seq(1,9e4,100)

system.time({
for (i in rev(rows)) {
  df <- tibble::add_row(df, .after = i)
}
})
#   user  system elapsed 
#  23.94    0.05   23.98 

df <- df[rep(1:9,1e4),]
rows <- seq(1,9e4,100)

system.time({
s <- sort(c(seq_len(nrow(df)), rows))
out <- df[s,]
out[duplicated(s),] <- NA
})
#   user  system elapsed 
#   0.00    0.01    0.02 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading