Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

XLSB to CSV with pandas, python

Trying to convert multiple XLSB files to CSV. Not sure what is the problem here

import os

import pandas as pd

path = r'C://Users//greencolor//Autoreport//Load_attachments//'
for filename in os.listdir(path):
    if filename.startswith("PB orders"):
        print(filename)                         #until here its working
        month = pd.read_excel(filename, sheet_name="Raw data ", engine="pyxlsb")
        print(month)                            # I get the error here
        month = month[month['Sales Manager'] == 'DEVON, JOHN'] #filtering by manager
        month.to_csv (path + filename + ".csv", index = None, header=True)

Error

FileNotFoundError: [Errno 2] No such file or directory: ‘PB orders Dec.xlsb’

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Why I get this error? print(filename) is printing all the XLSB files that name starts with PB orders

>Solution :

filename is just the file’s name, not the full path. You need to combine it with path to get the full path to the file. You can do that in a safe manner with os.path.join :

import os
...
for filename in os.listdir(path):
    if filename.startswith("PB orders"):
        full_path = os.path.join(path, filename)
        print(full_path )                         
        month = pd.read_excel(full_path , sheet_name="Raw data ", engine="pyxlsb")

Searching with a pattern

An alternative is to use glob to search for files that match a pattern. You still need to generate the full path:

import glob

...

for filename in glob.glob("PB orders*.xlsb", root_dir=path):
    full_path = os.path.join(path, filename)
    print(full_path )                         
    month = pd.read_excel(full_path , sheet_name="Raw data ", engine="pyxlsb")

Avoiding temp files

You still need to check the file name to avoid the temporary files generated when someone opens an Excel file (the files that start with ~) :

for filename in glob.glob("PB orders*.xlsb", root_dir=path):
    if not os.path.basename(filename).startswith("~"):
        full_path = os.path.join(path, filename)
            print(full_path )                         
            month = pd.read_excel(full_path , sheet_name="Raw data ", engine="pyxlsb")
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading