I have a list of 20+ data files with different file extensions (.csv, .xls, .xlsx). I want to store each data file as a data frame in a list. I wrote a for loop to do the task.
My "data" folder contains subfolders with different data files. Then within the loop, I called different functions accordingly depending on the file extension types. I was wondering if there is a way to read in files and store them in a single list without using a for loop?
Below is my code:
library(dplyr)
library(readxl)
library(XML)
# grab all file names
files <- list.files(path="data", recursive=T, pattern="*.(csv|xls|xlsx)$", full.names=T)
### example imported file names from above
files <- c("data/Bills/BillHistory_Account1.csv", "data/Bills/BillHistory_2.xls", "data/Usages/UsageHistory_3.xls", "data/Usages/UsageHistory_4.xls.xlsx")
# store each data frame in a list
df_list <- list()
for(i in seq_along(files)){
if(grepl("*.csv$", files[[i]])){
df_list[[i]] <- read.csv(files[[i]], sep="\t")
} else if(grepl("*.xls$", files[[i]])){
### some .xls files actually contain html codes
### need to read using XML::readHTMLTable
df_list[[i]] <- readHTMLTable(files[[i]])$tblMain
} else if(grepl("*.xlsx$", files[[i]])){
df_list[[i]] <- read_excel(files[[i]])
}
}
Thank you in advance for helping!
P.S. I was also wondering if there is a way to create a reproducible example for reading in local data files? I can’t think of any means other than having the reader create arbitrary local csv, or excel files.
>Solution :
Package {rio} is a universal wrapper around various import packages. A simple import(filename.ext) will guess the file format, so the whole task might boil down to:
# grab all file names
files <- list.files(path="data", recursive=T, pattern="*.(csv|xls|xlsx)$", full.names=T)
dataframe_list <- Map(files, f = \(filename) rio::import(filename))
If needed, you can still add reader-specific named arguments (sep, header, skip …) to import.