Merge dataframes having array

April 13, 2023

I have two data frames.

DF1

isActive,trackedSearchId
True,53436615
True,53434228
True,53434229

DF2

trackedSearchIds,Primary Keyword Group(s)
"[53436613, 53436615, 53437436, 53436506]",SEO - Directory-Deployment
"[53435887, 53437509, 53437441, 53436615, 53438685, 53437636]",SEO - Other-Glossary
"[53437504, 53435090, 53435887, 53434228]",SEO - Other
"[53437504, 53435090, 53434229]",SEO - Glossary

I want to check for each row of DF1 for column trackedSearchId and check with each row in DF2 with each value of trackedSearchIds array and if the value of DF1 column is present in DF2 append it with DF1.

The output should look like:

isActive,trackedSearchId,Primary Keyword Group(s)
True,53436615,SEO - Directory-Deployment&SEO - Other-Glossary
True,53434228,SEO - Other
True,53434229,SEO - Glossary

>Solution :

Assuming strings representations of lists in df2, use ast.literal_eval, then groupby.agg, explode, and merge:

import ast

df1['trackedSearchId'] = df1['trackedSearchId'].astype(str)

out = df1.merge((df2.assign(trackedSearchIds=df2['trackedSearchIds'].str.findall(r'(\d+)')).explode('trackedSearchIds')
                    .groupby('trackedSearchIds').agg('&'.join)
               ),
               left_on='trackedSearchId', right_on='trackedSearchIds')

Output:

   isActive trackedSearchId                         Primary Keyword Group(s)
0      True        53436615  SEO - Directory-Deployment&SEO - Other-Glossary
1      True        53434228                                      SEO - Other
2      True        53434229                                   SEO - Glossary