Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Is there a way in python using Pandas to merge two rows when one column value is equal the another column value of the following line

I have the flowing problem. I have a data frame as follow:

Country_code    country    low_number    high_number
-----------------------------------------------------
AU              Australia  1             10
FR              France     2             45
AU              Australia  10            23
AU              Australia  30            43
AU              Australia  43            55
FR              France     45            55
FR              France     67            80
FR              France     80            98

I would like to get an output such as:

Country_code    country    low_number    high_number
-----------------------------------------------------
AU              Australia  1             23
FR              France     2             55
AU              Australia  30            55
FR              France     67            98

How can I merge rows with a similar country or country_code when the high_number is equal to the low_number of the next line with the same country ?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Note that my data frame is already order by low_number values.

My level in Pandas is low. However I already group by countries:

countriesGroup = df.groupby(['country'], as_index=False)

I then unsuccessfully tried to groupby on one specific country using a condition:

france = countriesGroup.get_group('France')
france.groupby(france['high_number'] == france['low_number'].shift(1)).agg( { 'high_number': max, 'low_number': min})

I don’t know how to keep all the other columns while replacing the low_number value by the min value and the high_number by the high calue using agg.

Thank you in advance!

>Solution :

You can merge the DataFrame with itself:

>>> df.rename(columns={"high_number":"key"}).merge(df.rename(columns={"low_number":"key"})).drop(columns="key")

  Country_code    country  low_number  high_number
0           AU  Australia         1.0         23.0
1           FR     France         2.0         55.0
2           AU  Australia        30.0         55.0
3           FR     France        67.0         98.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading