Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

YOY growth based on ID

I am trying to calculate year or year growth for a variable in a Pandas dataframe. My data looks like this:

Year Country Industry Value
2000 USA Manufacturing 5
2000 Mexico Manufacturing 10
2001 Mexico Manufacturing 15
2002 Mexico Other 20

I have different number of observations depending on the Country or Industry. Expected output:

Year Country Industry Value YOY
2000 USA Manufacturing 5 NaN
2000 Mexico Manufacturing 10 NaN
2001 Mexico Manufacturing 15 50%
2002 Mexico Other 20 NaN

I tried different things including:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.groupby(['Country','Industry','Year'])['Value'].pct_change()

df['YOY'] = (df['Value'] - df.sort_values(by=['Country','Industry','Year']).groupby(['Country','Industry'])['Value'].shift(1))) / df['Value']

The first line calculates growth between rows without resetting for a new Country or Industry. The second one has incoherent results.

Any lead I could take? Thanks!!

>Solution :

Try this:

df['YOY'] = df.groupby(['Country','Industry'])['Value'].pct_change().mul(100)

Output:

>>> df
   Year Country       Industry  Value   YOY
0  2000     USA  Manufacturing      5   NaN
1  2000  Mexico  Manufacturing     10   NaN
2  2001  Mexico  Manufacturing     15  50.0
3  2002  Mexico          Other     20   NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading