I have a very simple task to achieve :
1) Read an input csv file (that has a header section, with various
length), 2) Do some basic transformations on it (as a dataframe) and
3) Make it as a new csv (including the same header section as the input csv).
My code below works fine except one annoying thing : the columns names are ignored while writing the new csv.
import pandas as pd
with open('input_file.csv', 'r') as f1, open('output_file.csv', 'w') as f2:
extra = False
for line in f1:
if ',' not in line:
extra = True
f2.write(line)
elif extra:
break
df = pd.read_csv(f1)
# some basic processing here to df like below
df.iat[0, 1] = 0
df.to_csv(f2, index=False)
My actual new csv looks like below (see how the line col1,col2,col3
is missing) :
Title=foo
Date=16/08/2023
Category=bar
...
id1,0,2
id2,3,4
id3,5,6
Can you explain why, please ? I feel like it has to do with my loop (hence, the title of my question).
This is my input csv btw :
Title=foo
Date=16/08/2023
Category=bar
...
col1,col2,col3
id1,1,2
id2,3,4
id3,5,6
>Solution :
Why don’t you reuse line
?
import pandas as pd
with open('input_file.csv', 'r') as f1, open('output_file.csv', 'w') as f2:
extra = False
for line in f1:
if ',' not in line:
extra = True
f2.write(line)
elif extra:
break
df = pd.read_csv(f1, names=line.strip().split(','))
# some basic processing here to df like below
df.iat[0, 1] = 0
df.to_csv(f2, index=False)
Alternatively, move back to the position before the line:
import pandas as pd
with open('input_file.csv', 'r') as f1, open('output_file.csv', 'w') as f2:
extra = False
pos = 0
for line in f1:
if ',' not in line:
extra = True
f2.write(line)
pos = f2.tell()
elif extra:
f1.seek(pos)
break
df = pd.read_csv(f1)
# some basic processing here to df like below
df.iat[0, 1] = 0
df.to_csv(f2, index=False)
Output df
:
col1 col2 col3
0 id1 0 2
1 id2 3 4
2 id3 5 6