Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Finding common words in a column based on values from another column

In a dataframe with a column named source, made of two different word lists

 source  words  letter_count
1 list1  apple       5
2 list1  pear        4
3 list1  banana      6
4 list2  ford        4
5 list2  chevy       5
6 list2  apple       5
7 list2  banana      6

I’m trying to return a new dataframe that shows the duplicate words in list1 and list2

   words   letter_count
1  apple        5
2  banana       6

I’m using python and pandas

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

I think you’re looking for pandas.Series.duplicated(). It returns a mask (a series containing True/False values corresponding to values that match a condition) where values that occur more than once in the series are True, and those that occur only are False. Then, you can index the dataframe with that mask:

new_df = df[df['words'].duplicated()].drop('source', axis=1)

Output:

>>> new_df
    words  letter_count
6  banana             6
7   apple             5
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading