Home python generator parsing one file at a time

Questions

python generator parsing one file at a time

December 15, 2021

I often have a folder with a bunch of csv files or excel or html etc.
I get tired of always writing a loop iterating over the files in a folder and then opening them with the appropriate library, so I was hoping I could build a generator that would yield, one file at a time, the file already opened with the appropriate library.
Here’s what I had been hoping to do:

def __get_filename__(file):
    lst = str(file).split('\\')[-1].split('/')[-1].split('.')
    filename, filetype = lst[-2], lst[-1]
    return filename, filetype

def file_iterator(file_path, parser=None, sep=None, encoding='utf8'):
    import pathlib as pl
    if parser == 'BeautifulSoup':
        from bs4 import BeautifulSoup
    elif parser == 'pandas':
        import pandas as pd

    for file in pl.Path(file_path):
        if file.is_file():
            filename, filetype = __get_filename__(file)
            if filetype == 'csv' and parser == 'pandas':
                yield pd.read_csv(file, sep=sep)
            elif filetype == 'excel' and parser == 'pandas':
                yield pd.read_excel(path, engine='openpyxl')
            elif filetype == 'xml' and parser == 'BeautifulSoup':
                with open(file, encoding=encoding, errors='ignore') as xml:
                    yield BeautifulSoup(xml, 'lxml')
                    yield soup
            elif parser == None:
                print(filename, filetype)
                yield file

but my hopes and dreams are crushed 😛 and if I do this:

for file in file_iterator(r'C:\Users\hwx756\Desktop\tmp/'):
    print(file)

this throws the error TypeError: 'WindowsPath' object is not iterable

I am sure there must be a way to do this somehow and I’m hoping that someone out there much smarter than me knows 🙂
thanks!

>Solution :

so this is what i think you should do.
get the names of all files in your folder by this

from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(folder_path) if isfile(join(folder_path, f))]

make that path absolute and use that absolute path to read files in pandas

also that file has typo

        yield pd.read_excel(path, engine='openpyxl')

No such thing as path

pathlib

byMR

Published December 15, 2021

Add a comment

Generate dates within a range in Snowflake

byMR

December 15, 2021

Questions

How to remove classname using multiple classname in JavaScript

byMR

December 15, 2021

Questions

Why does my button not click when I change its id attribute?

byMR

December 15, 2021

Questions

Make a list of lists where the last value of list is incremented

byMR

December 15, 2021

Questions

Haskell – Remove elements in a list of tuples that are greater than n

byMR

December 15, 2021

Questions

ReactJs : Show or hide input fields based on select value

byMR

December 15, 2021

python generator parsing one file at a time

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Generate dates within a range in Snowflake

How to remove classname using multiple classname in JavaScript

Why does my button not click when I change its id attribute?

Make a list of lists where the last value of list is incremented

Haskell – Remove elements in a list of tuples that are greater than n

ReactJs : Show or hide input fields based on select value

Keep Up to Date with the Most Important News

python generator parsing one file at a time

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Generate dates within a range in Snowflake

How to remove classname using multiple classname in JavaScript

Why does my button not click when I change its id attribute?

Make a list of lists where the last value of list is incremented

Haskell – Remove elements in a list of tuples that are greater than n

ReactJs : Show or hide input fields based on select value

Discover more from Dev solutions