Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove the last x% of rows in each df, in a list of dfs, dependent upon a numeric vector

I have a list of dataframes where each dataframe contains the same number of rows. I have a numeric vector (removal) that specifies how much I want to remove from each dataframe (e.g., remove the last 25% from the first df, df1).

I can keep the first x% of rows from each df by creating a function and using lapply, but I don’t know how to incorporate that code into a loop that would loop through removal. Any help is appreciated.

library(dplyr)
df1 <- data.frame(var = sample(1:5, 100, replace = TRUE))
df2 <- data.frame(var = sample(1:5, 100, replace = TRUE))
df3 <- data.frame(var = sample(1:5, 100, replace = TRUE))
lst <- list(df1, df2, df3)

head(df1)
#>   var
#> 1   3
#> 2   2
#> 3   2
#> 4   4
#> 5   5
#> 6   2

removal <- c(.25, .30, .50)
# I want to remove the last 25% of rows from df1,
# the last 30% of rows from df2,
# and the last 50% of rows from df3

# I can only make it so I can keep a static percentage from each df
# from the top of each df
# but I don't know to incorporate this code with a loop
# that would loop through `removal`
fake_removal <- .75

fake_remv <- function(x){
  x <- x %>% filter(row_number() < nrow(x) * fake_removal) 
  return(x)
  }

badlst <- lapply(lst, fake_remv)
print(nrow(badlst[[1]]))
#> [1] 74
head(badlst[[1]])
#>   var
#> 1   3
#> 2   2
#> 3   2
#> 4   4
#> 5   5
#> 6   2

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

We may need Map/mapply in base R or map2 from tidyverse

library(dplyr)
library(purrr)
map2(lst, removal, ~ .x %>% 
        filter(row_number() < nrow(.x) * .y))

The equivalent Map option would be

Map(function(dat, rml) subset(dat, seq_len(nrow(dat)) < nrow(dat) * rml),
         lst, removal)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading