I want to do different diff() manipulation on different columns in a pandas dataframe. Below is an example of using if-statement in a lambda function to take diff(1) on col1 and diff(2) on col2.
data = pd.DataFrame({'col1':[32,42,54,62,76,76,87,98,122,111,132,134,134,156],
'col2':[32,58,59,63,65,72,95,100,102,101,232,234,234,256]})
data.apply(lambda x: x.diff(1) if x.name=='col1' else x.diff(2))
I was first thinking about a solution with a dictionary, similar to the agg function. That would be easier when there is more than two columns. Does anyone have some handy methods on how to make different diff() operations on different columns?
>Solution :
If all operation return Series with same size like original column like diff or cumsum is possible use DataFrame.agg:
df = data.agg({'col1':lambda x: x.diff(), 'col2':lambda x: x.diff(2)})
print (df)
col1 col2
0 NaN NaN
1 10.0 NaN
2 12.0 27.0
3 8.0 5.0
4 14.0 6.0
5 0.0 9.0
6 11.0 30.0
7 11.0 28.0
8 24.0 7.0
9 -11.0 1.0
10 21.0 130.0
11 2.0 133.0
12 0.0 2.0
13 22.0 22.0
df = data.agg({'col1':lambda x: x.diff(), 'col2':'mean'})
print (df)
ValueError: cannot perform both aggregation and transformation operations simultaneously