I have a DataFrame that has a Collumn (AE) that o could contains: nothing (""), "X", "A" or "E".
I want to drop all the rows that HAS and "X" on it.
I searching here on StackOverflow I have found 2 ways of doing it:
df= df.drop(df[df.AE == "X"].index)
or
df=df[df["AE"] != "X"]
But for some reason, the first way of doing it drops more lines than it should.
Those two lines of code does the same thing?
There is some huge mistake I’m making when trying to do this "drop" using the first command?
>Solution :
They are not the same.
df = df.drop(df[df.AE == "X"].index)
Is dropping rows by their index value, if the indexes are not unique, then the index of the rows where df[‘AE’] == "X" might be shared across other cases.
df = df[df["AE"] != "X"]
Here we are slicing the dataframe and keeping all rows where df["AE"] is different from "X". There is no consideration regarding the index value and actually are not dropping rows, but actually keeping those that meet a criteria.