Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

python generator parsing one file at a time

I often have a folder with a bunch of csv files or excel or html etc.
I get tired of always writing a loop iterating over the files in a folder and then opening them with the appropriate library, so I was hoping I could build a generator that would yield, one file at a time, the file already opened with the appropriate library.
Here’s what I had been hoping to do:

def __get_filename__(file):
    lst = str(file).split('\\')[-1].split('/')[-1].split('.')
    filename, filetype = lst[-2], lst[-1]
    return filename, filetype

def file_iterator(file_path, parser=None, sep=None, encoding='utf8'):
    import pathlib as pl
    if parser == 'BeautifulSoup':
        from bs4 import BeautifulSoup
    elif parser == 'pandas':
        import pandas as pd

    for file in pl.Path(file_path):
        if file.is_file():
            filename, filetype = __get_filename__(file)
            if filetype == 'csv' and parser == 'pandas':
                yield pd.read_csv(file, sep=sep)
            elif filetype == 'excel' and parser == 'pandas':
                yield pd.read_excel(path, engine='openpyxl')
            elif filetype == 'xml' and parser == 'BeautifulSoup':
                with open(file, encoding=encoding, errors='ignore') as xml:
                    yield BeautifulSoup(xml, 'lxml')
                    yield soup
            elif parser == None:
                print(filename, filetype)
                yield file

but my hopes and dreams are crushed 😛 and if I do this:

for file in file_iterator(r'C:\Users\hwx756\Desktop\tmp/'):
    print(file)

this throws the error TypeError: 'WindowsPath' object is not iterable

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I am sure there must be a way to do this somehow and I’m hoping that someone out there much smarter than me knows 🙂
thanks!

>Solution :

so this is what i think you should do.
get the names of all files in your folder by this

from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(folder_path) if isfile(join(folder_path, f))]

make that path absolute and use that absolute path to read files in pandas

also that file has typo

        yield pd.read_excel(path, engine='openpyxl')

No such thing as path

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading