Trying to reduce the overhead of iterrows by changing it to itertuples. (there are many columns)
I’m trying to turn this with iterrows
def named_tuple_issue_iterrows(df: pd.DataFrame, column_name: str):
for index, series in df.iterrows():
result = series[column_name]
# Do something with result
Into itertuples.
def named_tuple_issue_itertuples(df: pd.DataFrame, column_name: str):
for namedtuple in df.itertuples():
result = namedtuple[column_name] # line throws error
# Do something with result
This function doesn’t know what the column_name is before hand and also doesn’t know what index it might be.
So namedtuple.column_a
and namedtuple[1]
are not usable solutions.
The real logic requires each row to construct another dataframe(based on other data), works out some more things and then edit a 3rd dataframe. The original dataframe itself is not changed in any manner. And there is the desire to access multiple unknown columns in the original frame.
Is there a way around this or do I need to use iterrows if the column_name required is not known?
>Solution :
You need to use getattr
:
def named_tuple_issue_itertuples(df: pd.DataFrame, column_name: str):
for namedtuple in df.itertuples():
result = getattr(namedtuple, column_name)
# Do something with result
Note that depending on what you really want to do, there might be a way to avoid the loop completely.