Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Calculate different between rows with pandas.diff()

I’m trying to calculate different between each row but I have an issue that I don’t know how to solve. Below is my sample code:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Place': ['Hanoi','Hanoi','Hanoi','Hanoi','Hanoi','Hochiminh','Hochiminh','Hochiminh','Hochiminh','Hochiminh'],
    'Date': ['2022-04-01','2022-04-02','2022-04-03','2022-04-04','2022-04-05','2022-04-01','2022-04-02','2022-04-03','2022-04-04','2022-04-05'],
    'Number': [0,2,4,6,8,12,17,20,26,28]})

And it’s the Ouput:

Place   Date    Number
0   Hanoi   2022-04-01  0
1   Hanoi   2022-04-02  2
2   Hanoi   2022-04-03  4
3   Hanoi   2022-04-04  6
4   Hanoi   2022-04-05  8
5   Hochiminh   2022-04-01  12
6   Hochiminh   2022-04-02  17
7   Hochiminh   2022-04-03  20
8   Hochiminh   2022-04-04  26
9   Hochiminh   2022-04-05  28

And then I used diff to calculate different number:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['diff'] = df['Number'].diff()

and it’s Output:

    Place   Date    Number  diff
0   Hanoi   2022-04-01  0   NaN
1   Hanoi   2022-04-02  2   2.0
2   Hanoi   2022-04-03  4   2.0
3   Hanoi   2022-04-04  6   2.0
4   Hanoi   2022-04-05  8   2.0
5   Hochiminh   2022-04-01  12  4.0
6   Hochiminh   2022-04-02  17  5.0
7   Hochiminh   2022-04-03  20  3.0
8   Hochiminh   2022-04-04  26  6.0
9   Hochiminh   2022-04-05  28  2.0

As you see row 5 of column diff is calculating by subtract row 5 of column number and row 4 of column number but it’s not what I want.
I want row 5 will be reset to NaN or 0 because I want to calculate different number for each place.

This is my expected Output

    Place   Date    Number  diff
0   Hanoi   2022-04-01  0   NaN
1   Hanoi   2022-04-02  2   2.0
2   Hanoi   2022-04-03  4   2.0
3   Hanoi   2022-04-04  6   2.0
4   Hanoi   2022-04-05  8   2.0
5   Hochiminh   2022-04-01  12  0.0
6   Hochiminh   2022-04-02  17  5.0
7   Hochiminh   2022-04-03  20  3.0
8   Hochiminh   2022-04-04  26  6.0
9   Hochiminh   2022-04-05  28  2.0

Thank you.

>Solution :

df['diff'] = df.groupby('Place')['Number'].diff().fillna(0)

df

       Place        Date  Number  diff
0      Hanoi  2022-04-01       0   0.0
1      Hanoi  2022-04-02       2   2.0
2      Hanoi  2022-04-03       4   2.0
3      Hanoi  2022-04-04       6   2.0
4      Hanoi  2022-04-05       8   2.0
5  Hochiminh  2022-04-01      12   0.0
6  Hochiminh  2022-04-02      17   5.0
7  Hochiminh  2022-04-03      20   3.0
8  Hochiminh  2022-04-04      26   6.0
9  Hochiminh  2022-04-05      28   2.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading