Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Splitting invoices to different data frame and extracting CVSs

I have a data frame having a huge list of all invoices. I need to split the data frame based on the column "invoice numbers". Total invoices list is up to 230,000 however the unique list of invoice are 138 (i.e for eg invoice "A", there might be 30 products for an invoice) and each such invoices should be split into a different dataset/data frame and extracted to excel/csv.

I tried the below code,

#List of invoice dataframe
Invoices = pd.read_csv('D:/Invoice list.csv')

Invoices.shape 
# Output =  (230265, 25)

#List unique invoices (It is just an unique taken from the 'Invoice number' column)
List = pd.read_csv('D:List.csv')

List.shape
# Output = (138, 1)

Inv_1 = Invoices[Invoices['Invoice number'] == 'TS-AVRB-Jan22-10']

However that will help to extract only 1 invoice number at a time, is it possible to do for all the 138 invoices?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried using For loop as below,

for i in List:
    Inv = Invoices[Invoices['Invoice number'] == 'i']
    Inv.to_csv('i.csv')

I get an output of a blank CSV file and not individual 138 CSV files having a list of invoices

Sample CSV file link having sample dataset is provided in the below link
Link : https://drive.google.com/drive/folders/1B6IPsMAhXh9RvMlN-juFSTAJipuDYuZ-?usp=sharing

Help would be much appreciated!

>Solution :

There are a few syntax errors in the code. You have to pass the i in loop instead of string 'i'. Also for the filename, you have to pass the invoice number in the string to prevent overwriting the file with the same name. The full code would be as followed

#List of invoice dataframe
Invoices = pd.read_csv('D:/Invoice list.csv')

print(Invoices.shape)
# Output =  (230265, 25)

#List unique invoices (It is just an unique taken from the 'Invoice number' column)
invoice_num_list = pd.read_csv('D:/List.csv')

print(invoice_num_list.shape)
# Output = (138, 1)

for invoice_num in invoice_num_list:
    Inv = Invoices[Invoices['Invoice number'] == invoice_num]
    Inv.to_csv(f'{invoice_num}.csv')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading