Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Flag if a row is duplicated an attach if it's the 1st, 2nd, etc duplicated row

I’d like to flag if a row is duplicated, and attach if it’s the 1st, 2nd, 3rd, etc duplicated column in a Pandas DataFrame.

More visually, I’d like to go from:

id Country City
1 France Paris
2 France Paris
3 France Lyon
4 France Lyon
5 France Lyon

to

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

id Country City duplicated_flag
1 France Paris 1
2 France Paris 1
3 France Lyon 2
4 France Lyon 2
5 France Lyon 2

Note that id is not taken into account to see if the row is duplicated.

>Solution :

Two options:

First, if you have lots of columns that you need to compare, you can use:

comparison_df = df.drop("id", axis=1)
df["duplicated_flag"] = (comparison_df != comparison_df.shift()).any(axis=1).cumsum()

We drop the columns that aren’t needed in the comparison. Then, we check whether each row is equivalent to the one above it using .shift() and .any(). Finally, we read off the value of duplicated_flag using .cumsum().


But, if you only have two columns to compare (or if for some reason you have lots of columns that you need to drop), you can find mismatched rows one at a time, and then use .cumsum() to get the value of duplicated_flag for each row. It’s a bit more verbose so I’m not super happy with this option, but I’m leaving this here for completeness in case this suits your use case better:

country_comparison = df["Country"].ne(df["Country"].shift())
city_comparison = df["City"].ne(df["City"].shift())
df["duplicated_flag"] = (country_comparison | city_comparison).cumsum()
print(df)

These output:

     id    Country      City  duplicated_flag
0  1     France     Paris                   1
1  2     France     Paris                   1
2  3     France      Lyon                   2
3  4     France      Lyon                   2
4  5     France      Lyon                   2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading