Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How .transform handle the splitted groups?

I have this dataframe :

import pandas as pd
df = pd.DataFrame({'subject': ['a', 'a', 'b', 'b', 'c', 'd'],
 'level': ['hard', None, None, 'easy', None, 'medium']})

print(df)

  subject   level
0       a    hard
1       a    None
2       b    None
3       b    easy
4       c    None
5       d  medium

When using the code :

df.groupby('subject').transform(lambda group: print(group))

I got four printed groups. That’s ok because we have four subjects : a, b, c and d
But I don’t understand the group 2, i feel like transform have accumulated the values of the two first groups. Also, there is a weird indentation that seem to separate the first group from the second one

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

# ------------------------ group1
0    hard
1    None
Name: level, dtype: object
# ------------------------ group2
  level
0  hard
1  None
2    None
3    easy
Name: level, dtype: object
# ------------------------ group3
4    None
Name: level, dtype: object
# ------------------------ group4
5    medium
Name: level, dtype: object

Can someone please explain the logic to me ?

>Solution :

It’s not, but transform runs some checks to see the type of the output. In general you don’t use transform for its side effects (you should use apply as shown later), but rather to return something of the same shape as the input.

What exactly happens might be more explicit with a custom function:

def f(group):
    print('---')
    print(group.name)  # with `transform` this shouldn't give the group name
    print(group)
    print('===')
    
df.groupby('subject').transform(f)

Output:

---                           # first group
level
0    hard
1    None
Name: level, dtype: object
===
---                           # internal pandas check (not a real group)
a
  level
0  hard
1  None
===
---                           # second group
level
2    None
3    easy
Name: level, dtype: object
===
---                           # third group
level
4    None
Name: level, dtype: object
===
---                           # fourth group
level
5    medium
Name: level, dtype: object
===

In comparison, using apply that does give the group names and which you can use for this kind of operations:

df.groupby('subject').apply(f)

---
a
  subject level
0       a  hard
1       a  None
===
---
b
  subject level
2       b  None
3       b  easy
===
---
c
  subject level
4       c  None
===
---
d
  subject   level
5       d  medium
===

don’t use transform to manually work on groups.

Here is another example. In transform, group.name returns the current Series name, see what happens with multiple columns:

df = pd.DataFrame({'subject': ['a', 'a', 'b', 'b', 'c', 'd'],
                   'level': ['hard', None, None, 'easy', None, 'medium'],
                   'level2': ['hard', None, None, 'easy', None, 'medium']
                  })
df.groupby('subject').transform(lambda g: print(g.name))

print output:

level    # first group, column "level"
level2   # first group, column "level2"
a        # some internal check run only once
level    # second group, column "level"
level2   # second group, column "level2"
level    # etc.
level2
level
level2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading