Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas Column containing a list of matched strings found using str.contains('|'.join(words))

I have string data that I am matching to a list of terms and I want to create a new column that shows all words found in each row of the dataframe

df = pd.DataFrame({'String': ['Cat Dog Fish', 'Cat Dog', 'Pig Horse', 'DogFish']})
print(df)

         String
0  Cat Dog Fish
1       Cat Dog
2     Pig Horse
3       DogFish
4      CatHorse

words = ['Cat', 'Dog', 'Fish']

I know using df.loc[df['String'].str.contains('|'.join(words))].copy() will return all rows that contain at least one of the terms I’m searching for but I would like output that keeps a record of which terms are found in each string like this:

         String           Matched
0  Cat Dog Fish  [Cat, Dog, Fish]
1       Cat Dog        [Cat, Dog]
2       DogFish       [Dog, Fish]
3      CatHorse             [Cat]

or even just

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

         String     Matched
0  Cat Dog Fish  CatDogFish
1       Cat Dog      CatDog
2       DogFish     DogFish
3      CatHorse         Cat

Not sure where to begin with making this column, any help is appreciated

>Solution :

Try:

words = ["Cat", "Dog", "Fish"]

df["Matched"] = df["String"].apply(lambda s: [w for w in words if w in s])
print(df)

Prints:

         String           Matched
0  Cat Dog Fish  [Cat, Dog, Fish]
1       Cat Dog        [Cat, Dog]
2     Pig Horse                []
3       DogFish       [Dog, Fish]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading