Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Edit columns based on duplicate values found in Pandas

I have below dataframe:

No:      Fee:
111      500
111      500
222      300
222      300
123      400

If data in No is duplicate, I want to keep only one fee and remove others.
Should look like below:

    No:      Fee:
    111      500
    111      
    222      300
    222      
    123      400

I actually have no idea where to start, so please guide here.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thanks.

>Solution :

Use DataFrame.duplicated with set empty string by DataFrame.loc:

#if need test duplicated by both columns
mask = df.duplicated(['No','Fee'])

df.loc[mask, 'Fee'] = ''
print (df)
    No  Fee
0  111  500
1  111     
2  222  300
3  222     
4  123  400

But then lost numeric column, because mixed numbers with strings:

print (df['Fee'].dtype)
object

Possible solution is use missing values if need numeric column:

df.loc[mask, 'Fee'] = np.nan
print (df)
    No    Fee
0  111  500.0
1  111    NaN
2  222  300.0
3  222    NaN
4  123  400.0

print (df['Fee'].dtype)
float64

df.loc[mask, 'Fee'] = np.nan

df['Fee'] = df['Fee'].astype('Int64')
print (df)
    No   Fee
0  111   500
1  111  <NA>
2  222   300
3  222  <NA>
4  123   400

print (df['Fee'].dtype)
Int64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading