Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Check if string in df1 exists anywhere in df2 and return the matching column name(s) in df1

I’m trying to check for string matches between two dataframes and return the name of the column the match was found in.

I need to include all matches found in an exact match to be returned as a list in df1.

My Dataframes

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df1 (Data to lookup is in the title column)

    title  counts
0    nike       3
1  adidas       2
2   rebok       2

df1 (Dataframe to search for matches in)

          0       1         2
0      nike  adidas     rebok
1      nike   rebok    hitech
2  converse    puma  converse
3    hitech  adidas      nike

Desired Output

    title  counts  matching_cols
0    nike       3  [0,0,2]
1  adidas       2  [1,1]
2   rebok       2  [1,2]

Minimum Reproducible Example

# make df1
data = [['nike', 3], ['adidas', 2], ['rebok', 2]]
df_counts = pd.DataFrame(data, columns=['title', 'counts'])

# make df2
lu_data = [{0: 'nike', 1: 'adidas', 2: 'rebok'},
        {0: 'nike', 1: 'rebok', 2: 'hitech'},
        {0: 'converse', 1: 'puma', 2: 'converse'},
        {0: 'hitech', 1: 'adidas', 2: 'nike'}
        ]

df_words = pd.DataFrame(lu_data)

What I’ve tried:
I’m a little bit stuck reading a lot of similar threads on stackoverflow / google. The issue I’m having is most solutions are looking to match the name of the column, whereas I’m not looking to return the name of a column when a string matches.

>Solution :

Melt the dataframe then groupby and agg with list to create a mapping dict that maps title to list of column names, then use this dictionary to substitute values in title column of df_counts

d = df_words.melt().groupby('value')['variable'].agg(list)
df_counts['matching_cols'] = df_counts['title'].map(d)

Result

    title  counts matching_cols
0    nike       3     [0, 0, 2]
1  adidas       2        [1, 1]
2   rebok       2        [1, 2]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading