How to use increasing index value as a key to merge with a non-index column

I have two dataframes like as shown below

tdf = pd.DataFrame({'subj_id': [11,12,13,14,15],
                   'dash': np.random.choice(list('PPPS'),size=(5)),
                   'dumeel': np.random.choice(list('QWRR'),size=(5)),
                   'dumma': np.random.choice((1234),size=(5)),
                   'target': np.random.choice([0,1],size=(5))

cdf = pd.DataFrame({'key_index': [1,1,1,3,3],
                   'dash': np.random.choice(list('abcd'),size=(5)),
                   'dumeel': np.random.choice(list('test'),size=(5)),
                   'dumma': np.random.choice((7890),size=(5))

As you can see that cdf contains it’s index (also key) in a column called key_index. However, I have to merge these records with tdf to get the subj_id column.

Basically, key_index = 1 in cdf corresponds to 2nd index row in tdf. key_index = 3 corresponds to 4th index row in tdf etc. key_index + 1 in cdf always corresponds to = matching index position in tdf

I tried the below but it doesn’t work. I don’t know how to add and scale this for million data rows

cdf.merge(tdf, right_index=True, left_index=False) #merge error
cdf.merge(tdf, right_index=False, left_index=True) #merge error

I expect my output to be like as shown below

enter image description here

>Solution :

Don’t merge, use map the single column you want atfer increasing the reference by 1:

cdf['subj_id'] = cdf['key_index'].add(1).map(tdf['subj_id'])


   key_index dash dumeel  dumma  subj_id
0          1    c      e    427       13
1          1    b      s   7090       13
2          1    d      t   4056       13
3          3    a      e   4186       15
4          3    b      s   1433       15

Alternative with merge:


Leave a Reply