Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to add new columns considering previous years?

I wanted to add new columns based on other columns and also group by "x" and "y":

  • For the first new column (feature2), give the value of the "feature" two years ago.
  • For the "feature3", give the value of the "feature" three years ago.
  • For the "feature4", give the value of the "feature" four years ago.
  • (And also it will continue for 10 years)

You can find the sample dataset and expected output below.

data = {'x': [40.1, 50.1, 40.1, 50.1, 40.1, 50.1, 40.1, 50.1, 40.1, 50.1 ], 'y': [100.1, 110.1, 100.1, 110.1, 100.1, 110.1, 100.1, 110.1, 100.1, 110.1], 'year': [2000, 2000, 2001, 2001, 2002, 2002, 2003, 2003, 2004, 2004], 'feature': [0, 1, 1, 1, 1, 0, 0, 0, 0, 1]}   
df = pd.DataFrame(data)
df

      x       y     year  feature
0   40.1    100.1   2000    0
1   50.1    110.1   2000    1
2   40.1    100.1   2001    1
3   50.1    110.1   2001    1
4   40.1    100.1   2002    1
5   50.1    110.1   2002    0
6   40.1    100.1   2003    0
7   50.1    110.1   2003    0
8   40.1    100.1   2004    0
9   50.1    110.1   2004    1

Expected Output:
          x       y     year  feature   feature2  feature3   feature4
    0   40.1    100.1   2000    0         NaN       NaN        NaN
    1   50.1    110.1   2000    1         NaN       NaN        NaN
    2   40.1    100.1   2001    1         NaN       NaN        NaN
    3   50.1    110.1   2001    1         NaN       NaN        NaN
    4   40.1    100.1   2002    1          0        NaN        NaN
    5   50.1    110.1   2002    0          1        NaN        NaN
    6   40.1    100.1   2003    0          1         0         NaN
    7   50.1    110.1   2003    0          1         1         NaN
    8   40.1    100.1   2004    0          1         1          0
    9   50.1    110.1   2004    1          0         1          1

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Assuming the years are sorted and are all present in each group, you can use a loop and groupby.shift:

g = df.groupby(['x', 'y'])['feature']

for x in range(3):
    df[f'feature{x+2}'] = g.shift(x+2)

output:

      x      y  year  feature  feature2  feature3  feature4
0  40.1  100.1  2000        0       NaN       NaN       NaN
1  50.1  110.1  2000        1       NaN       NaN       NaN
2  40.1  100.1  2001        1       NaN       NaN       NaN
3  50.1  110.1  2001        1       NaN       NaN       NaN
4  40.1  100.1  2002        1       0.0       NaN       NaN
5  50.1  110.1  2002        0       1.0       NaN       NaN
6  40.1  100.1  2003        0       1.0       0.0       NaN
7  50.1  110.1  2003        0       1.0       1.0       NaN
8  40.1  100.1  2004        0       1.0       1.0       0.0
9  50.1  110.1  2004        1       0.0       1.0       1.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading