How .transform handle the splitted groups?

October 16, 2023

I have this dataframe :

import pandas as pd
df = pd.DataFrame({'subject': ['a', 'a', 'b', 'b', 'c', 'd'],
 'level': ['hard', None, None, 'easy', None, 'medium']})

print(df)

  subject   level
0       a    hard
1       a    None
2       b    None
3       b    easy
4       c    None
5       d  medium

When using the code :

df.groupby('subject').transform(lambda group: print(group))

I got four printed groups. That’s ok because we have four subjects : a, b, c and d
But I don’t understand the group 2, i feel like transform have accumulated the values of the two first groups. Also, there is a weird indentation that seem to separate the first group from the second one

# ------------------------ group1
0    hard
1    None
Name: level, dtype: object
# ------------------------ group2
  level
0  hard
1  None
2    None
3    easy
Name: level, dtype: object
# ------------------------ group3
4    None
Name: level, dtype: object
# ------------------------ group4
5    medium
Name: level, dtype: object

Can someone please explain the logic to me ?

>Solution :

It’s not, but transform runs some checks to see the type of the output. In general you don’t use transform for its side effects (you should use apply as shown later), but rather to return something of the same shape as the input.

What exactly happens might be more explicit with a custom function:

def f(group):
    print('---')
    print(group.name)  # with `transform` this shouldn't give the group name
    print(group)
    print('===')
    
df.groupby('subject').transform(f)

Output:

---                           # first group
level
0    hard
1    None
Name: level, dtype: object
===
---                           # internal pandas check (not a real group)
a
  level
0  hard
1  None
===
---                           # second group
level
2    None
3    easy
Name: level, dtype: object
===
---                           # third group
level
4    None
Name: level, dtype: object
===
---                           # fourth group
level
5    medium
Name: level, dtype: object
===

In comparison, using apply that does give the group names and which you can use for this kind of operations:

df.groupby('subject').apply(f)

---
a
  subject level
0       a  hard
1       a  None
===
---
b
  subject level
2       b  None
3       b  easy
===
---
c
  subject level
4       c  None
===
---
d
  subject   level
5       d  medium
===

don’t use `transform` to manually work on groups.

Here is another example. In transform, group.name returns the current Series name, see what happens with multiple columns:

df = pd.DataFrame({'subject': ['a', 'a', 'b', 'b', 'c', 'd'],
                   'level': ['hard', None, None, 'easy', None, 'medium'],
                   'level2': ['hard', None, None, 'easy', None, 'medium']
                  })
df.groupby('subject').transform(lambda g: print(g.name))

print output:

level    # first group, column "level"
level2   # first group, column "level2"
a        # some internal check run only once
level    # second group, column "level"
level2   # second group, column "level2"
level    # etc.
level2
level
level2