Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove rows from dataframe if A column value is in C column

I have this code:

test = {"number": ['1555','1666','1777', '1888'],
        "order_amount": ['100.00','200.00','-200.00', '300.00'],
        "number_of_refund": ['','','1666', '']
    }

df = pd.DataFrame(test)

Which returns the following dataframe:

  number order_amount number_of_refund
0   1555       100.00                 
1   1666       200.00                 
2   1777      -200.00             1666
3   1888       300.00                 

What would be the best solution to remove the row if the order number is refunded? I would want to remove the order row and the refund row.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Logic if df[‘number’].value is in df[‘number_of_refund’] and the amount of df[‘number’].value is the opposite of the df[‘number_of_refund’] rows.

So the result in this case should be:

number order_amount number_of_refund
0   1555       100.00                 
1   1888       300.00                 

>Solution :

You can use boolean indexing with two masks:

# is the row a refund?
m1 = df['number_of_refund'].ne('')
# is the number order matching a refund?
m2 = df['number'].isin(df.loc[m1, 'number_of_refund'])

# keep rows that do not match any of the above
out = df[~(m1|m2)]

output:

  number  order_amount number_of_refund
0   1555         100.0                 
3   1888         300.0                 

partial refunds:

df['order_amount'] = pd.to_numeric(df['order_amount'], errors='coerce')

# is the row a refund?
m1 = df['number_of_refund'].ne('')
# get mapping of refunds
s = df[m1].set_index('number_of_refund')['order_amount']
# update amounts
df['order_amount'] += df['number'].map(s).fillna(0)
# find null values
m2 = df['order_amount'].eq(0)


# keep rows that do not match any mask
df = df[~(m1|m2)]

output:

  number  order_amount number_of_refund
0   1555         100.0                 
1   1666         100.0                 
3   1888         300.0                 

input for partial refund:

test = {"number": ['1555','1666','1777', '1888'],
        "order_amount": ['100.00','200.00','-100.00', '300.00'],
        "number_of_refund": ['','','1666', '']
    }

df = pd.DataFrame(test)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading