Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas groupby transform

Need a confirmation regarding behaviors of Pandas Groupby transform:

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                      'foo', 'bar'],
               'B' : ['one', 'one', 'two', 'three',
                      'two', 'two'],
               'C' : [1, 5, 5, 2, 5, 5],
               'D' : [2.0, 5., 8., 1., 2., 9.]})
grouped = df.groupby('A')
grouped.transform(lambda x: (x - x.mean()) / x.std())

          C         D
0 -1.154701 -0.577350
1  0.577350  0.000000
2  0.577350  1.154701
3 -1.154701 -1.000000
4  0.577350 -0.577350
5  0.577350  1.000000

It does not specify which column to apply the lambda function. how pandas decide which columns (in this case, C and D) to apply the function? why did it not apply to column B and throw an error?

why the output does not include column A and B?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

GroupBy.transform calls the specified function for each column in each group (so B, C, and D – not A because that’s what you’re grouping by). However, the functions you’re calling (mean and std) only work with numeric values, so Pandas skips the column if it’s dtype is not numeric. String columns are of dtype object, which isn’t numeric, so B gets dropped, and you’re left with C and D.

You should have got warning when you ran your code—

FutureWarning: Dropping invalid columns in DataFrameGroupBy.transform is deprecated. In a future version, a TypeError will be raised. Before calling .transform, select only columns which should be valid for the transforming function.

As it indicates, you need to select the columns you want to process prior to processing in order to evade the warning. You can do that by added [['C', 'D']] (to select, for example, your C and D columns) before you call transform:

grouped[['C', 'D']].transform(lambda x: (x - x.mean()) / x.std())
#      ^^^^^^^^^^^^ important
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading