How to add new columns considering previous years?

June 15, 2022

I wanted to add new columns based on other columns and also group by "x" and "y":

For the first new column (feature2), give the value of the "feature" two years ago.
For the "feature3", give the value of the "feature" three years ago.
For the "feature4", give the value of the "feature" four years ago.
(And also it will continue for 10 years)

You can find the sample dataset and expected output below.

data = {'x': [40.1, 50.1, 40.1, 50.1, 40.1, 50.1, 40.1, 50.1, 40.1, 50.1 ], 'y': [100.1, 110.1, 100.1, 110.1, 100.1, 110.1, 100.1, 110.1, 100.1, 110.1], 'year': [2000, 2000, 2001, 2001, 2002, 2002, 2003, 2003, 2004, 2004], 'feature': [0, 1, 1, 1, 1, 0, 0, 0, 0, 1]}   
df = pd.DataFrame(data)
df

      x       y     year  feature
0   40.1    100.1   2000    0
1   50.1    110.1   2000    1
2   40.1    100.1   2001    1
3   50.1    110.1   2001    1
4   40.1    100.1   2002    1
5   50.1    110.1   2002    0
6   40.1    100.1   2003    0
7   50.1    110.1   2003    0
8   40.1    100.1   2004    0
9   50.1    110.1   2004    1

Expected Output:
          x       y     year  feature   feature2  feature3   feature4
    0   40.1    100.1   2000    0         NaN       NaN        NaN
    1   50.1    110.1   2000    1         NaN       NaN        NaN
    2   40.1    100.1   2001    1         NaN       NaN        NaN
    3   50.1    110.1   2001    1         NaN       NaN        NaN
    4   40.1    100.1   2002    1          0        NaN        NaN
    5   50.1    110.1   2002    0          1        NaN        NaN
    6   40.1    100.1   2003    0          1         0         NaN
    7   50.1    110.1   2003    0          1         1         NaN
    8   40.1    100.1   2004    0          1         1          0
    9   50.1    110.1   2004    1          0         1          1

>Solution :

Assuming the years are sorted and are all present in each group, you can use a loop and groupby.shift:

g = df.groupby(['x', 'y'])['feature']

for x in range(3):
    df[f'feature{x+2}'] = g.shift(x+2)

output:

      x      y  year  feature  feature2  feature3  feature4
0  40.1  100.1  2000        0       NaN       NaN       NaN
1  50.1  110.1  2000        1       NaN       NaN       NaN
2  40.1  100.1  2001        1       NaN       NaN       NaN
3  50.1  110.1  2001        1       NaN       NaN       NaN
4  40.1  100.1  2002        1       0.0       NaN       NaN
5  50.1  110.1  2002        0       1.0       NaN       NaN
6  40.1  100.1  2003        0       1.0       0.0       NaN
7  50.1  110.1  2003        0       1.0       1.0       NaN
8  40.1  100.1  2004        0       1.0       1.0       0.0
9  50.1  110.1  2004        1       0.0       1.0       1.0