Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

reading in local files with different extensions – efficient way?

I have a list of 20+ data files with different file extensions (.csv, .xls, .xlsx). I want to store each data file as a data frame in a list. I wrote a for loop to do the task.

My "data" folder contains subfolders with different data files. Then within the loop, I called different functions accordingly depending on the file extension types. I was wondering if there is a way to read in files and store them in a single list without using a for loop?

Below is my code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(dplyr)
library(readxl)
library(XML)

# grab all file names
files <- list.files(path="data", recursive=T, pattern="*.(csv|xls|xlsx)$", full.names=T)

### example imported file names from above
files <- c("data/Bills/BillHistory_Account1.csv", "data/Bills/BillHistory_2.xls", "data/Usages/UsageHistory_3.xls", "data/Usages/UsageHistory_4.xls.xlsx")

# store each data frame in a list
df_list <- list()
for(i in seq_along(files)){
  if(grepl("*.csv$", files[[i]])){
    df_list[[i]] <- read.csv(files[[i]], sep="\t")
  } else if(grepl("*.xls$", files[[i]])){
    ### some .xls files actually contain html codes
    ### need to read using XML::readHTMLTable
    df_list[[i]] <- readHTMLTable(files[[i]])$tblMain
  } else if(grepl("*.xlsx$", files[[i]])){
    df_list[[i]] <- read_excel(files[[i]])
  }
}

Thank you in advance for helping!

P.S. I was also wondering if there is a way to create a reproducible example for reading in local data files? I can’t think of any means other than having the reader create arbitrary local csv, or excel files.

>Solution :

Package {rio} is a universal wrapper around various import packages. A simple import(filename.ext) will guess the file format, so the whole task might boil down to:

# grab all file names
files <- list.files(path="data", recursive=T, pattern="*.(csv|xls|xlsx)$", full.names=T)

dataframe_list <- Map(files, f = \(filename) rio::import(filename))

If needed, you can still add reader-specific named arguments (sep, header, skip …) to import.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading