Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

blank spaces in dataframe

I need some help figuring out the best method for adding a blank space between each table from a Pandas data frame when converting to CSV inside a range-based for loop

# Read a PDF File
df = tabula.read_pdf("out.pdf", pages='all')
with open('out.csv', 'a') as f:
    for x in df:
        x.to_csv('out.csv', mode ='a', sep=',', index=False)
        f.write('\n')

The intended output would be as follows…

1,12,22,33,43,54,64,75,84,95
2,13,23,34,44,55,65,76,85,96
3,14,24,35,45,56,66,77,86,97
4,15,25,36,46,57,67,78,87,98
5,16,26,37,47,58,68,79,88,99
6,17,27,38,48,59,69,80,89,100
7,18,28,39,49,60,70,,90,
8,,29,,50,,71,,91,
9,19,30,40,51,61,72,81,92,101
10,20,31,41,52,62,73,82,93,102
11,21,32,42,53,63,74,83,94,103
1P,2P,3P,4P,5P,6P,7P,8P,9P,10P

104,115,124,135,144,155,165,176,186,197
105,116,125,136,145,156,166,177,187,198
106,117,126,137,146,157,167,178,188,199
107,118,127,138,147,158,168,179,189,200
108,119,128,139,148,159,169,180,190,201
109,120,129,140,149,160,170,181,191,202
110,,130,,150,161,171,182,192,203
111,,131,,151,,172,,193,
112,121,132,141,152,162,173,183,194,204
113,122,133,142,153,163,174,184,195,205
114,123,134,143,154,164,175,185,196,206
11P,12P,13P,14P,15P,16P,17P,18P,19P,20P

However, instead, two new line chars are appended to the end of the file.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I may have a fundamental misunderstanding as to how the append mode works for the to_csv function and would appreciate clarification on why the lines are being added to the end of the file, instead of inline where they are wanted.

A code-based alternative is also appreciated.

Thank you!

>Solution :

This will work for you:

OUTPUT_FILE = 'out.csv'
for i, x in enumerate(df):
    x.to_csv(
        OUTPUT_FILE,
        mode='w' if i == 0 else 'a',
        sep=',',
        index=False
    )
    if i < len(df) - 1:
        with open(OUTPUT_FILE, 'a') as f:
            f.write('\n')

I suspect what might be happening in your example is that there are two buffers opened at the same time for the same file. df.to_csv writes to one, f.write to the other and at the end they get flushed to the disk consecutively (first the df.to_csv one and then the one with two new line characters).

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading