Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how do I succinctly create a new dataframe column based on matching existing column values with list of values?

I want to create a new column in a dataframe by matching the values in an existing column’s values with a predefined list of values. I have two approaches to this below. Both run but dont give me exactly what I want. I prefer the first approach over the second but not sure where I am going wrong with both. I want the solution to be succinct without having to write out alot of np.where statements.

First approach:

words = [['one man ran','two men ran','three men ran'],['red balloons','white shirt','blue dress']]
df3 = pd.DataFrame(words, columns = ['col1','col2','col3'])

search_words1 = ['three','blue']

def columns(search_words1):
    for i in search_words1:
        return "".join(np.where((df3['col3'].str.contains(i)), i, ""))
        
    
df3['col4'] = df3['col3'].apply(lambda x: columns(x))
df3

incomplete result:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel


col1    col2    col3    col4
0   one man ran two men ran three men ran   t
1   red balloons    white shirt blue dress  b

Second approach:

search_words1 = ['three','blue']

def my_comments(search_words1):
        return "".join([i for i in search_words1 if any(i in x for x in df3['col3'])])
    

df3['col4'] = df3['col3'].apply(lambda x: my_comments(x))
df3

incomplete result:


col1    col2    col3    col4
0   one man ran two men ran three men ran   three men ran
1   red balloons    white shirt blue dress  blue dress

Desired output for both approaches:

col1    col2    col3    col4
0   one man ran two men ran three men ran   three
1   red balloons    white shirt blue dress  blue

>Solution :

Use str.extract: create a regex pattern of your search words and try to extract the matched pattern:

pattern = fr"\b({'|'.join(search_words1)})\b"
df3['col4'] = df3['col3'].str.extract(pattern)

Pattern:

>>> print(pattern)
\b(man|red)\b

\b matches the empty string, but only at the beginning or end of a word. The ( ) is the capture group.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading