Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

python – pandas: function to replicate the creation of a variable

I am trying to replicate the variable aux_35, because I have some missing values in my database. Here is a little sample of the dataset:

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt

import dateutil.relativedelta as rd
import math
from itertools import groupby
from itertools import repeat
from operator import itemgetter

import warnings
warnings.filterwarnings('ignore')

df = pd.DataFrame({'pdt_050':[[0.683522, 0.26141],
[0.683522, 0.26141],
[0.683522, 0.26141],
[0.726501, 0.373269, 0.159278],
[0.726501, 0.373269, 0.159278],
[0.596246, 0.288327, 0.120612],
[0.353175, 0.314364, 0.159139],
[0.595886, 0.25835],
[0.582035],
[0.726501, 0.373269, 0.159278],
[0.583463, 0.366378, 0.262419, 0.19254, 0.1288, 0.064597],
[0.751279, 0.436349, 0.248187, 0.110235]
],
'aux_35': [0.683522, 0.683522,0.683522, 0.726501, 0.726501, 0.596246, 0.159139,0.25835,0.582035, 0.373269, 0.583463,
0.436349
],
'tob': [1, 1,1, 1, 1, 1, 14, 2, 1, 1, 0, 1
]
})

enter image description here

Basically aux_35 take data from pdt_050 and assign the value based on the variable tob. For example: when the number of tob is equal to 1 or 0, aux_35 should be the first element of the array pdt_050 and when tob is a number that is higher than the length of elements on pdt_050, aux_35 should be equal to the last element in pdt_050; as you can see on the row number six.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I was making the function to replicate that process:

def mmonths(df):
    pdo = []
    pdoriginal = df['pdt_050']
    tob_y = df['aux_35'].astype(int)
    for i in range(len(tob_y)):
        tob = tob_y[i]
        try:
            pdo.append(pdoriginal[i][(tob)])
        except:
            pdo.append(pdoriginal[i][0])
            
    return pdo

df['replica']  = mmonths(df)

But, as you can see in the following pic, it is not good. Can you help me please?

enter image description here

Thanks!

>Solution :

Lets apply a custom indexer function along column axis

def indexer(a, i):
    return a[max(1, min(int(i), len(a))) - 1]

df['aux_35'] = df.apply(lambda s: indexer(s['pdt_050'], s['tob']), axis=1)

Result

                                                      pdt_050  tob    aux_35
0                                         [0.683522, 0.26141]    1  0.683522
1                                         [0.683522, 0.26141]    1  0.683522
2                                         [0.683522, 0.26141]    1  0.683522
3                              [0.726501, 0.373269, 0.159278]    1  0.726501
4                              [0.726501, 0.373269, 0.159278]    1  0.726501
5                              [0.596246, 0.288327, 0.120612]    1  0.596246
6                              [0.353175, 0.314364, 0.159139]   14  0.159139
7                                         [0.595886, 0.25835]    2  0.258350
8                                                  [0.582035]    1  0.582035
9                              [0.726501, 0.373269, 0.159278]    1  0.726501
10  [0.583463, 0.366378, 0.262419, 0.19254, 0.1288, 0.064597]    0  0.583463
11                   [0.751279, 0.436349, 0.248187, 0.110235]    1  0.751279
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading