In the following example, what would be the best to group so that there could be a new column that is formed by taking first year in each group and subtracting by current year. For example in in row with index 0 it would be NaN, row with index 1 , it would = 1, row with index 2 it would = 3, row with index 4 = 1 and so forth.
>>> import pandas as pd
>>> df = pd.DataFrame({'id': ['1', '1', '1', '2', '2', '3', '4', '4'],
... 'Year': [2000, 2001, 2003, 2004, 2005, 2002, 2001, 2003]})
>>> print(df)
id Year
0 1 2000
1 1 2001
2 1 2003
3 2 2004
4 2 2005
5 3 2002
6 4 2001
7 4 2003
>Solution :
Transform Year with first to get the first year per id, then subtract this from Year column to get difference, finally mask the values where difference is 0:
s = df['Year'] - df.groupby('id')['Year'].transform('first')
df['col'] = s.mask(s == 0)
id Year col
0 1 2000 NaN
1 1 2001 1.0
2 1 2003 3.0
3 2 2004 NaN
4 2 2005 1.0
5 3 2002 NaN
6 4 2001 NaN
7 4 2003 2.0