Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter dataframe based on matching values from two columns

I have a dataframe like as shown below

cdf = pd.DataFrame({'Id':[1,2,3,4,5],
                    'Label':[1,2,3,0,0]})

I would like to filter the dataframe based on the below criteria

cdf['Id']==cdf['Label']  # first 3 rows are matching for both columns in cdf

I tried the below

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

flag = np.where[cdf['Id'].eq(cdf['Label'])==True,1,0]
final_df = cdf[cdf['flag']==1]

but I got the below error

TypeError: ‘function’ object is not subscriptable

I expect my output to be like as shown below

     Id Label
0    1   1
1    2   2
2    3   3

>Solution :

I think you’re overthinking this. Just compare the columns:

>>> cdf[cdf['Id'] == cdf['Label']]
   Id  Label
0   1      1
1   2      2
2   3      3

Your particular error though is coming from the fact that you’re using square brackets to call np.where, e.g. np.where[...], which is wrong. You should be using np.where(...) instead, but the above solution is bound to be as fast as it gets 😉

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading