Suppose we have a pandas dataframe:
import pandas as pd
data = pd.DataFrame({'columnNM': ['Jerry', 'Bob', 'Phil', 'Bill', 'Mickey', 'Pigpen', 'Robert'],
'columnNM2': ['John', 'Tom', 'Donna', 'Keith', 'Brent', 'Vince', 'Bruce']})
Also suppose we have an open file we are writing to, something opened using:
file = open('myPathExample', 'w')
I want to perform comparison operations, control flow on the data and write back to that file. A simple example would be:
for row in data.itertuples():
file.write('%s was friends with %s \n' %(row.columnNM, row.columnNM2))
Now, I am a beginner level in python and I have read all over that looping or iterating over rows in a pandas dataframe is not ideal, especially for large datasets. I don’t have the knowledge to understand the full details of why.
Is a good vectorized alternative to itertuples for this example even possible? If so, what is it?
>Solution :
The vectorial alternative would be to build a single string and write once to the file:
file.write('\n'.join(data['columnNM']+' was friends with '+data['columnNM2']))
Or, if you want to keep the loop:
for line in (data['columnNM']+' was friends with '+data['columnNM2']+' \n'):
file.write(line)