Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas search across multiple columns return one column if matches

Example data:

df1 = pd.DataFrame({
    'a': [1, 6, 3, 9],
    'b': ['A', 'B', 'C', 'D'],
    'c': [10, 20, 30, 40],
    'd': [100, 200, 300, 400]
})

df2 = pd.DataFrame({
    'm': [1, 5, 3, 7],
    'n': [2, 6, 8, 4],
    'o': [9, 10, 11, 12]
})

Requirement:
df1['a'] can occur anywhere of df2. I want to return df2['m'] irrespective of where the match is found.

After some googling and chatGpt, I found melting df2 and merging with df1 is helpful except for it doesn’t check for a match in df2['m'].

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Code:

df2_melted = df2.melt(id_vars=['m'], value_vars=['n', 'o'])
merged_df = df1.merge(df2_melted, left_on='a', right_on='value', how='left')
df1['e'] = merged_df['m']
print(df1)

Output:

a  b   c    d    e
1  A  10  100  NaN     # df1['a'] == df2['m']
6  B  20  200  5.0     # df1['a'] == df2['n']
3  C  30  300  NaN     # df1['a'] == df2['m']
9  D  40  400  1.0     # df1['a'] == df2['o']

Required Output:

a  b   c    d  e
1  A  10  100  1
6  B  20  200  5
3  C  30  300  3
9  D  40  400  1

If df2['m'] could also be added to value_vars while melting, it’d have resolved the issue. I tried it, it didn’t work. Then checked docs, found that whatever is there in the id_vars, the remaining or a subset of the remaining can be part of value_vars. So this approach might not be correct or I’m missing something.

Then I thought, if df1['a'] matches df2['m'], then df1['e'] == df1['a'] == df2['m']. So just replacing NaN value with df1['a'] should work and it worked. But had to convert the column to int; because of NaN, it’s changed to float.

Working complete Code:

df2_melted = df2.melt(id_vars=['m'], value_vars=['n', 'o'])
merged_df = df1.merge(df2_melted, left_on='a', right_on='value', how='left')
df1['e'] = merged_df['m']
df1['e'] = (df1['e'].fillna(df1['a'])).astype(int)

Seemed like even though it’s a working solution, it’s unnecessarily complicated: "try any solution: add more code to fix the issues as you proceed without changing the initial solution".

Any other better approach which can help with my requirement?

PS1: In above example, it’s not mandatory that df1 and df2 will have the same number of rows.

>Solution :

Just duplicate the column m as a new column before melt (also use map instead of merge):

df1['e'] = df1['a'].map(df2.eval('e=m').melt('e')
                           .set_index('value')['e'])

Variant with merge:

out = df1.merge(df2.assign(e=df2['m'])
                   .melt('e', value_name='a')
                   [['a', 'e']]
               )

Output:

   a  b   c    d  e
0  1  A  10  100  1
1  6  B  20  200  5
2  3  C  30  300  3
3  9  D  40  400  1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading