Advertisements
I have the following function for getting the column name of last non-zero value of a row
import pandas as pd
def myfunc(X, Y):
df = X.iloc[Y]
counter = len(df)-1
while counter >= 0:
if df[counter] == 0:
counter -= 1
else:
break
return(X.columns[counter])
Using the following code example
data = {'id': ['1', '2', '3', '4', '5', '6'],
'name': ['AAA', 'BBB', 'CCC', 'DDD', 'EEE', 'GGG'],
'A1': [1, 1, 1, 0, 1, 1],
'B1': [0, 0, 1, 0, 0, 1],
'C1': [1, 0, 1, 1, 0, 0],
'A2': [1, 0, 1, 0, 1, 0]}
df = pd.DataFrame(data)
df
myfunc(df, 5) # 'B1'
I would like to know how can I apply this function to all rows in a dataframe, and put the results into a new column of df
I am thinking about looping across all rows (which probably is not the good approach) or using lambdas with apply function. However, I have not suceed with this last approach. Any help?
>Solution :
I’ve modified your function a little bit to work across rows:
def myfunc(row):
counter = len(row)-1
while counter >= 0:
if row[counter] == 0:
counter -= 1
else:
break
return row.index[counter]
Now just call df.apply
your function and axis=1
to call the function for each row of the dataframe:
>>> df.apply(myfunc, axis=1)
0 A2
1 A1
2 A2
3 C1
4 A2
5 B1
dtype: object
However, you can ditch your custom function and use this code to do what you’re looking for in a much faster and more concise manner:
>>> df[df.columns[2:]].T.cumsum().idxmax()
0 A2
1 A1
2 A2
3 C1
4 A2
5 B1
dtype: object