Advertisements
I have the following dataframe
import pandas as pd
foo = pd.DataFrame({'id': [1,1,1,2,2,2],
'time': [1,2,3,1,2,3],
'col_id': ['ffp','ffp','ffp', 'hie', 'hie', 'ttt'],
'col_a': [1,2,3,4,5,6],
'col_b': [-1,-2,-3,-4,-5,-6],
'col_c': [10,20,30,40,50,60]})
id time col_id col_a col_b col_c
0 1 1 ffp 1 -1 10
1 1 2 ffp 2 -2 20
2 1 3 ffp 3 -3 30
3 2 1 hie 4 -4 40
4 2 2 hie 5 -5 50
5 2 3 ttt 6 -6 60
I would like to create a new col
in foo
, which will take the value of either col_a
or col_b
or col_c
, depending on the value of col_id
.
I am doing the following:
foo['col'] = np.where(foo.col_id == "ffp", foo.col_a,
np.where(foo.col_id == "hie",foo.col_b, foo.col_c))
which gives
id time col_id col_a col_b col_c col
0 1 1 ffp 1 -1 10 1
1 1 2 ffp 2 -2 20 2
2 1 3 ffp 3 -3 30 3
3 2 1 hie 4 -4 40 -4
4 2 2 hie 5 -5 50 -5
5 2 3 ttt 6 -6 60 60
Since I have a lot of columns, I was wondering if there is a cleaner way to do that, with using a dictionary for example:
dict_cols_matching = {"ffp" : "col_a", "hie": "col_b", "ttt": "col_c"}
Any ideas ?
>Solution :
You can map
the values of the dictionary on col_id
, then perform indexing lookup:
import numpy as np
idx, cols = pd.factorize(foo['col_id'].map(dict_cols_matching))
foo['col'] = foo.reindex(cols, axis=1).to_numpy()[np.arange(len(foo)), idx]
Output:
id time col_id col_a col_b col_c col
0 1 1 ffp 1 -1 10 1
1 1 2 ffp 2 -2 20 2
2 1 3 ffp 3 -3 30 3
3 2 1 hie 4 -4 40 -4
4 2 2 hie 5 -5 50 -5
5 2 3 ttt 6 -6 60 60