Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to prevent a line from being consumed while iterating through a csv?

I have a very simple task to achieve :

1) Read an input csv file (that has a header section, with various
length), 2) Do some basic transformations on it (as a dataframe) and
3) Make it as a new csv (including the same header section as the input csv).

My code below works fine except one annoying thing : the columns names are ignored while writing the new csv.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd

with open('input_file.csv', 'r') as f1, open('output_file.csv', 'w') as f2:

    extra = False
    for line in f1:
      if ',' not in line:
          extra = True
          f2.write(line)
      elif extra:
        break

    df = pd.read_csv(f1)

    # some basic processing here to df like below
    df.iat[0, 1] = 0 

    df.to_csv(f2, index=False)

My actual new csv looks like below (see how the line col1,col2,col3 is missing) :

Title=foo
Date=16/08/2023
Category=bar
...
id1,0,2
id2,3,4
id3,5,6

Can you explain why, please ? I feel like it has to do with my loop (hence, the title of my question).

This is my input csv btw :

Title=foo
Date=16/08/2023
Category=bar
...
col1,col2,col3
id1,1,2
id2,3,4
id3,5,6

>Solution :

Why don’t you reuse line?

import pandas as pd
with open('input_file.csv', 'r') as f1, open('output_file.csv', 'w') as f2:
    extra = False
    for line in f1:
      if ',' not in line:
          extra = True
          f2.write(line)
      elif extra:
        break
    df = pd.read_csv(f1, names=line.strip().split(','))
    # some basic processing here to df like below
    df.iat[0, 1] = 0 
    df.to_csv(f2, index=False)

Alternatively, move back to the position before the line:

import pandas as pd
with open('input_file.csv', 'r') as f1, open('output_file.csv', 'w') as f2:
    extra = False
    pos = 0
    for line in f1:
      if ',' not in line:
          extra = True
          f2.write(line)
          pos = f2.tell()
      elif extra:
        f1.seek(pos)
        break
    df = pd.read_csv(f1)
    # some basic processing here to df like below
    df.iat[0, 1] = 0 
    df.to_csv(f2, index=False)

Output df:

  col1  col2  col3
0  id1     0     2
1  id2     3     4
2  id3     5     6
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading