Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas link column using list item

I have two dataframes df and tf like as given below

df = [{"unique_key": 1, "test_ids": "1.0,15,2.0,nan"}, {"unique_key": 2, "test_ids": "51,75.0,11.0,NaN"},{"unique_key": 3, "test_ids":np.nan},
     {"unique_key": 4, "test_ids":np.nan}]
df = pd.DataFrame(df)

test_ids,status,revenue,cnt_days     
1,passed,234.54,3          
2,passed,543.21,5
11,failed,21.3,4
15,failed,2098.21,6             
51,passed,232,21     
75,failed,123.87,32 

tf = pd.read_clipboard(sep=',')

I would like to link the unique_key column from df to the tf dataframe

For ex: I will show my output below (that’s easy to understand than text)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I was trying something like below

for b in df.test_ids.tolist():
    for a in b.split(','):
        if a >= 0: # to exclude NA values from checking
            for i in len(test_ids):
              if int(a)  == tf['test_ids'][i]:
                   tf['unique_key'] = df['unique_key']
                 

But this is neither efficient nor elegant to solve my problem.

Is there any other better way to achieve the expected output shown below?

enter image description here

>Solution :

You can create Series with remove duplicates and missing values, swap to dictioanry and for new first column use DataFrame.insert with Series.map:

s = (df.set_index('unique_key')['test_ids']
       .str.split(',')
       .explode()
       .astype(float)
       .dropna()
       .astype(int)
       .drop_duplicates()
d = {v: k for k, v in s.items()}
print (d)
{1: 1, 15: 1, 2: 1, 51: 2, 75: 2, 11: 2}

tf.insert(0, 'unique_key', tf['test_ids'].map(d))
print (tf)
   unique_key  test_ids  status  revenue  cnt_days
0           1         1  passed   234.54         3
1           1         2  passed   543.21         5
2           2        11  failed    21.30         4
3           1        15  failed  2098.21         6
4           2        51  passed   232.00        21
5           2        75  failed   123.87        32

Another idea is working with DataFrame and create Series for mapping:

s = (df.assign(new = df['test_ids'].str.split(','))
       .explode('new')
       .astype({'new':float})
       .dropna(subset=['new'])
       .astype({'new':int})
       .drop_duplicates(subset=['new'])
       .set_index('new')['unique_key'])

print (s)
new
1     1
15    1
2     1
51    2
75    2
11    2
Name: unique_key, dtype: int64

tf.insert(0, 'unique_key', tf['test_ids'].map(s))
print (tf)
   unique_key  test_ids  status  revenue  cnt_days
0           1         1  passed   234.54         3
1           1         2  passed   543.21         5
2           2        11  failed    21.30         4
3           1        15  failed  2098.21         6
4           2        51  passed   232.00        21
5           2        75  failed   123.87        32
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading