Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

If two seperate cells in a pandas dataframe doesn't contain a text, drop the entire row?

Pandas Dataframe hypothetical example:

'A' 'B' 'C'
A+1 B+1  1
A+2 B+1  2
A+3 B+1  3

Let’s say i want to only keep the rows where column ‘A’ contains ‘1’ and column ‘B’ contains ‘1’, any other rows that dont meet this condition gets dropped.

So the output dataframe looks like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

'A' 'B' 'C'
A+1 B+1  1

My attempt was to iterate through each row in column A and B:

for i,j in df.iterrows():
    if "1" in (df['A']) & (df['B']):
        print()
    else:
        df.drop()

But i got this error instead:

TypeError: unsupported operand type(s) for &: 'str' and 'str'

Is there another way to do this?

>Solution :

You can use Series.str.contains for the A and B columns to return a mask for each, where the item is True if that item in the column contains 1, False otherwise. Then use & to join them together (i.e., return a new mask where each item is True if both items in the other masks are True, False otherwise), and use the result to index the dataframe:

subset = df[df['A'].str.contains('1') & df['B'].str.contains('1')]

Output:

>>> subset
     A    B  C
0  A+1  B+1  1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading