Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Different ways to conditional Drop Row in Pandas

I have a DataFrame that has a Collumn (AE) that o could contains: nothing (""), "X", "A" or "E".
I want to drop all the rows that HAS and "X" on it.

I searching here on StackOverflow I have found 2 ways of doing it:

df= df.drop(df[df.AE == "X"].index)

or

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df=df[df["AE"] != "X"]

But for some reason, the first way of doing it drops more lines than it should.

Those two lines of code does the same thing?
There is some huge mistake I’m making when trying to do this "drop" using the first command?

>Solution :

They are not the same.

df = df.drop(df[df.AE == "X"].index)

Is dropping rows by their index value, if the indexes are not unique, then the index of the rows where df[‘AE’] == "X" might be shared across other cases.

df = df[df["AE"] != "X"]

Here we are slicing the dataframe and keeping all rows where df["AE"] is different from "X". There is no consideration regarding the index value and actually are not dropping rows, but actually keeping those that meet a criteria.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading