Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to get values from a column based on values in another column in pandas

I have a dataframe that looks like sample DF below:

import pandas as pd
df = pd.DataFrame({'user_id' : 
                   ['aefc6',         
                    '7edc3',
                    '6f85b',
                    '6f99b',
                    '6f85b'],
                   'user_name' : ['john', 'jack', 'jill', 'tom', 'jill'],
                   'Event' : ['A','B','C','D', 'E'],
    'Collabs' : ['adsfkj', '6f85b,asdfad', 'adfad', '9b123', '101bv']
})

The scenario is the user_id, user_name columns are those of event managers. But some event managers are also collaborators for some other event as can be seen in the example of "jill" who was a collaborator for Event B.

The "collabs" column is a CSV column and I unnest it as below:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['new'] = df['Collabs'].str.split(',')
df = df.explode('new')
df['exist'] = df['new'].map(lambda x : x in '|'.join(df['user_id']))

As mentioned above, we can see that ‘jill’ with user_id 6f85b is present in the Collab column. My requirement is I need to get the user_id and user_name of those who appear in the new column or where exist column is True. I.E I need to get the ids and names of managers who acted as collaborators for other events.

Expected output:

    user_id user_name   Event    Collabs        new     exist   collab_user_id  collab_user_name
0   aefc6   john        A        adsfkj         adsfkj  FALSE   NaN             NaN
1   7edc3   jack        B        6f85b,asdfad   6f85b   TRUE    6f85b           jill
1   7edc3   jack        B        6f85b,asdfad   asdfad  FALSE   NaN             NaN
2   6f85b   jill        C        adfad          adfad   FALSE   NaN             NaN
3   6f99b   tom         D        9b123          9b123   FALSE   NaN             NaN
4   6f85b   jill        E        101bv          101bv   FALSE   NaN             NaN

I tried joining:

pd.merge(df, df.loc[:,['user_name', 'new']], left_on = 'user_id', right_on = 'new', how = 'left')

But instead of "jill" the output (understandably) is having "jack". Could someone please let me know how to get the desired output.

>Solution :

You can use isin to create exist column then use np.where to create collab_user_id and map the user_name to collab_user_name

df['exist'] = df['new'].isin(df['user_id'])
df['collab_user_id'] = np.where(df['exist'], df['new'], np.nan)
df['collab_user_name'] = df['collab_user_id'].map(dict(zip(df['user_id'], df['user_name'])))
  user_id user_name Event       Collabs     new  exist collab_user_id collab_user_name
0   aefc6      john     A        adsfkj  adsfkj  False            NaN              NaN
1   7edc3      jack     B  6f85b,asdfad   6f85b   True          6f85b             jill
1   7edc3      jack     B  6f85b,asdfad  asdfad  False            NaN              NaN
2   6f85b      jill     C         adfad   adfad  False            NaN              NaN
3   6f99b       tom     D         9b123   9b123  False            NaN              NaN
4   6f85b      jill     E         101bv   101bv  False            NaN              NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading