Follow

Follow

Contact

Home Edit columns based on duplicate values found in Pandas

Questions

Edit columns based on duplicate values found in Pandas

byMR

December 15, 2021

I have below dataframe:

No:      Fee:
111      500
111      500
222      300
222      300
123      400

If data in No is duplicate, I want to keep only one fee and remove others.
Should look like below:

    No:      Fee:
    111      500
    111      
    222      300
    222      
    123      400

I actually have no idea where to start, so please guide here.

Thanks.

>Solution :

Use DataFrame.duplicated with set empty string by DataFrame.loc:

#if need test duplicated by both columns
mask = df.duplicated(['No','Fee'])

df.loc[mask, 'Fee'] = ''
print (df)
    No  Fee
0  111  500
1  111     
2  222  300
3  222     
4  123  400

But then lost numeric column, because mixed numbers with strings:

print (df['Fee'].dtype)
object

Possible solution is use missing values if need numeric column:

df.loc[mask, 'Fee'] = np.nan
print (df)
    No    Fee
0  111  500.0
1  111    NaN
2  222  300.0
3  222    NaN
4  123  400.0

print (df['Fee'].dtype)
float64

df.loc[mask, 'Fee'] = np.nan

df['Fee'] = df['Fee'].astype('Int64')
print (df)
    No   Fee
0  111   500
1  111  <NA>
2  222   300
3  222  <NA>
4  123   400

print (df['Fee'].dtype)
Int64

pandas

byMR

Published December 15, 2021

Add a comment

Leave a ReplyCancel reply

Read more

Questions

Passing string as column name in lag()

byMR

December 15, 2021

Questions

TypeScript Expected 1 arguments, but got 2

byMR

December 15, 2021

Questions

How to properly handle reject in Promises

byMR

December 15, 2021

Questions

Find in set to get the separate values with comma

byMR

December 15, 2021

Questions

The method pickImage isn't defined for the class

byMR

December 15, 2021

Questions

JSONB to record update using for loop – postgres

byMR

December 15, 2021