Converting iterrows into itertuples and accessing namedtuples

Advertisements

Trying to reduce the overhead of iterrows by changing it to itertuples. (there are many columns)

I’m trying to turn this with iterrows

def named_tuple_issue_iterrows(df: pd.DataFrame, column_name: str):
    for index, series in df.iterrows():
         result = series[column_name]
         # Do something with result

Into itertuples.

def named_tuple_issue_itertuples(df: pd.DataFrame, column_name: str):
    for namedtuple in df.itertuples():
         result = namedtuple[column_name]  # line throws error
         # Do something with result

This function doesn’t know what the column_name is before hand and also doesn’t know what index it might be.
So namedtuple.column_a and namedtuple[1] are not usable solutions.

The real logic requires each row to construct another dataframe(based on other data), works out some more things and then edit a 3rd dataframe. The original dataframe itself is not changed in any manner. And there is the desire to access multiple unknown columns in the original frame.

Is there a way around this or do I need to use iterrows if the column_name required is not known?

>Solution :

You need to use getattr:

def named_tuple_issue_itertuples(df: pd.DataFrame, column_name: str):
    for namedtuple in df.itertuples():
         result = getattr(namedtuple, column_name)
         # Do something with result

Note that depending on what you really want to do, there might be a way to avoid the loop completely.

Leave a ReplyCancel reply