Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Get first and last occurrence of duplicate value

I have a dataframe like this

index    col1     col2    col3    col4
  0      11/20    11/26   abc     35
  1      11/21    11/24   xxx     30
  2      11/22    11/27   abc     20

Here col3 has a same value (duplicated based on value).
I sum() col4 from rows based on the col3 value.

So in this case I do

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = df.groupby(['col3'])[['col4']].sum()

But with this approach I get

index    col3    col4
  0      abc     55
  1      xxx     30

I would like to be able to preserve first occurrence of duplicated value’s col1(11/20) and last occurrence of col2(11/27), so the final product would looks like

index    col1     col2    col3    col4
  0      11/20    11/27   abc     55
  1      11/21    11/24   xxx     30

>Solution :

One way using pandas.DataFrame.groupby.agg:

new_df = df.groupby("col3", as_index=False).agg({"col1": "first",
                                                 "col2": "last",
                                                 "col4": "sum"})
print(new_df)

Output:

  col3   col1   col2  col4
0  abc  11/20  11/27    55
1  xxx  11/21  11/24    30
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading