Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Calculating the difference between the first non-na value and the last na-value based on a grouped condition

I am looking to calculate the percentage increase or decrease between the first and last non-na value for the following dataset:

Year Company Data
2019 X 341976.00
2020 X 1.000
2021 X 282872.00
2019 Y NaN
2020 Y NaN
2021 Y NaN
2019 Z 4394.00
2020 Z 173.70
2021 Z 518478.00

As I want the relative change I would expect the formula to do something like:

(last non-na value)/(first non-na value)-1

This should return something like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Year Company Data Data
2019 X 341976.00 NaN
2020 X 1.000 NaN
2021 X 282872.00 -0.17
2019 Y NaN NaN
2020 Y NaN NaN
2021 Y NaN NaN
2019 Z 4394.00 NaN
2020 Z 173.70 NaN
2021 Z 518478.00 11.700

I have tried to combine groupby based on the company field with the first_valid_index but havent had any luck finding a solution. What is the most efficient way of calculating the relative change as above?

>Solution :

If aggregate GroupBy.first and
GroupBy.last it omit missing values, so is possible divide values and subtract 1:

s = df.groupby('Company')['Data'].agg(['last','first']).eval('last / first').sub(1)

Then found index values for last non missing values per Company:

idx = df.dropna(subset=['Data']).drop_duplicates(['Company'], keep='last').index

And mapping only matchded rows by Series.map:

df.loc[idx, 'Date'] = df.loc[idx, 'Company'].map(s)
print (df)

   Year Company      Data        Date
0  2019       X  341976.0         NaN
1  2020       X       1.0         NaN
2  2021       X  282872.0   -0.172831
3  2019       Y       NaN         NaN
4  2020       Y       NaN         NaN
5  2021       Y       NaN         NaN
6  2019       Z    4394.0         NaN
7  2020       Z     173.7         NaN
8  2021       Z  518478.0  116.996814
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading