Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pySpark check Dataframe contains in another Dataframe

Assume I have two Dataframes:

DF1: DATA1, DATA1, DATA2, DATA2

DF2: DATA2

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I want to exclude all existence of data in DF2 while keeping duplicates in DF1, what should I do?

Expected result: DATA1, DATA1

>Solution :

Use left anti
When you join two DataFrame using Left Anti Join (leftanti), it returns only columns from the left DataFrame for non-matched records.

df3 = df1.join(df2, df1['id']==df2['id'], how='left_anti')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading