Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Apply same function to several Dataframes – R

I’m currently working with 8 databases with the same structure, what I would like to know is how to apply the same steps and modifications to all the bases at the same time.

I know that with the lapply function and passing the databases to a list it is possible to do but I can not specify it.

The steps I need to perform are as follows:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df1$EMAIL <- str_to_lower(df1$EMAIL)
df2$EMAIL <- str_to_lower(df2$EMAIL)
dfn$EMAIL <- str_to_lower(dfn$EMAIL)
df8$EMAIL <- str_to_lower(df8$EMAIL)

d1$EMAIL <- stri_trans_general(d1$EMAIL,"Latin-ASCII") 
d2$EMAIL <- stri_trans_general(d2$EMAIL,"Latin-ASCII")
dn$EMAIL <- stri_trans_general(dn$EMAIL,"Latin-ASCII")
d8$EMAIL <- stri_trans_general(d8$EMAIL,"Latin-ASCII")

df1$CATEGORY <- str_to_Title(df1$CATEGORY)
df2$CATEGORY <- str_to_Title(df2$CATEGORY)
dfn$CATEGORY <- str_to_Title(dfn$CATEGORY)
df8$CATEGORY <- str_to_Title(df8$CATEGORY)

df1_e <- select(df1, EMAIL, CATEGORY, COMPANY)
df2_e <- select(df2, EMAIL, CATEGORY, COMPANY)
dfn_e <- select(dfn, EMAIL, CATEGORY, COMPANY)
df8_e <- select(df8, EMAIL, CATEGORY, COMPANY)

EMAILS <- bind_rows(df1_e, df2_e, dfn_e, dfn_8)%>%unique(EMAIL)

They are simple steps that do not require much time to perform one by one. But I would like to learn how to be more efficient and save space and time in the script.

Thanks in advance

>Solution :

A general solution as you have already identified is to put the dataframes in a list and use lapply/map on each dataframe.

Here’s a solution using map_df from purrr. If the dataframe are called as df1, df2df8 then you can use mget to create a list of dataframes. I have also created an id variable which will give the dataframe name for each row.

library(dplyr)
library(purrr)

EMAILS <- map_df(mget(paste0('df', 1:8)), function(x) {
  x %>%
    transmute(EMAIL = str_to_lower(EMAIL) %>% stri_trans_general("Latin-ASCII"), 
              CATEGORY = str_to_title(CATEGORY), 
              COMPANY)
}, .id = 'id') %>% distinct(EMAIL, .keep_all = TRUE)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading