How to prevent a line from being consumed while iterating through a csv?

I have a very simple task to achieve :

1) Read an input csv file (that has a header section, with various
length), 2) Do some basic transformations on it (as a dataframe) and
3) Make it as a new csv (including the same header section as the input csv).

My code below works fine except one annoying thing : the columns names are ignored while writing the new csv.

import pandas as pd

with open('input_file.csv', 'r') as f1, open('output_file.csv', 'w') as f2:

    extra = False
    for line in f1:
      if ',' not in line:
          extra = True
          f2.write(line)
      elif extra:
        break

    df = pd.read_csv(f1)

    # some basic processing here to df like below
    df.iat[0, 1] = 0 

    df.to_csv(f2, index=False)

My actual new csv looks like below (see how the line col1,col2,col3 is missing) :

Title=foo
Date=16/08/2023
Category=bar
...
id1,0,2
id2,3,4
id3,5,6

Can you explain why, please ? I feel like it has to do with my loop (hence, the title of my question).

This is my input csv btw :

Title=foo
Date=16/08/2023
Category=bar
...
col1,col2,col3
id1,1,2
id2,3,4
id3,5,6

>Solution :

Why don’t you reuse line?

import pandas as pd
with open('input_file.csv', 'r') as f1, open('output_file.csv', 'w') as f2:
    extra = False
    for line in f1:
      if ',' not in line:
          extra = True
          f2.write(line)
      elif extra:
        break
    df = pd.read_csv(f1, names=line.strip().split(','))
    # some basic processing here to df like below
    df.iat[0, 1] = 0 
    df.to_csv(f2, index=False)

Alternatively, move back to the position before the line:

import pandas as pd
with open('input_file.csv', 'r') as f1, open('output_file.csv', 'w') as f2:
    extra = False
    pos = 0
    for line in f1:
      if ',' not in line:
          extra = True
          f2.write(line)
          pos = f2.tell()
      elif extra:
        f1.seek(pos)
        break
    df = pd.read_csv(f1)
    # some basic processing here to df like below
    df.iat[0, 1] = 0 
    df.to_csv(f2, index=False)

Output df:

  col1  col2  col3
0  id1     0     2
1  id2     3     4
2  id3     5     6

Leave a Reply