Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to use dictionary on np.where clause in pandas

I have the following dataframe

import pandas as pd
foo = pd.DataFrame({'id': [1,1,1,2,2,2],
                    'time': [1,2,3,1,2,3],
             'col_id': ['ffp','ffp','ffp', 'hie', 'hie', 'ttt'],
             'col_a': [1,2,3,4,5,6],
             'col_b': [-1,-2,-3,-4,-5,-6],
                'col_c': [10,20,30,40,50,60]})

id  time col_id  col_a  col_b  col_c
0   1     1    ffp      1     -1     10
1   1     2    ffp      2     -2     20
2   1     3    ffp      3     -3     30
3   2     1    hie      4     -4     40
4   2     2    hie      5     -5     50
5   2     3    ttt      6     -6     60

I would like to create a new col in foo, which will take the value of either col_a or col_b or col_c, depending on the value of col_id.

I am doing the following:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

foo['col'] = np.where(foo.col_id == "ffp", foo.col_a, 
                      np.where(foo.col_id == "hie",foo.col_b, foo.col_c))

which gives

  id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60

Since I have a lot of columns, I was wondering if there is a cleaner way to do that, with using a dictionary for example:

dict_cols_matching = {"ffp" : "col_a", "hie": "col_b", "ttt": "col_c"}

Any ideas ?

>Solution :

You can map the values of the dictionary on col_id, then perform indexing lookup:

import numpy as np

idx, cols = pd.factorize(foo['col_id'].map(dict_cols_matching))

foo['col'] = foo.reindex(cols, axis=1).to_numpy()[np.arange(len(foo)), idx]

Output:

   id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading