Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Map multiple columns from pandas dataframe to a dictionary and conditionally return a value to a new column

I have a pandas dataframe with multiple columns and a dictionary with keys that correspond to the column names. I want to check the column values with respect to the dictionary values and return either a ‘yes’ or ‘no’ based on whether the column value meets a "greater than or equal to" condition.

Example:

import pandas as pd
dfdict = {'col1': [1,2,3], 'col2':[2,3,4], 'col3': [3.2, 4.2, 7.7]}
checkdict = {'col1': 2, 'col2': 3, 'col3': 1.5}
df = pd.DataFrame(dfdict)

For each column, for each row, check whether the row value is greater than or equal to than the value in the dictionary. For that row, if any of the columns meet the condition, return a "yes" to a newly created column, else return a "no".

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

What I’ve tried:

def checkcond(element):
    if not math.isnan(element):
        x = checkdict[element]
        return 1 if element >= x else 0
    else:
        pass

df['test'] = df.applymap(checkcond)

but of course this doesn’t work because the row value is supplied to the checkcond function rather than the column name and row.

I also tried:

df['test'] = pd.np.where(df[['col1', 'col2', 'col3']].ge(0).any(1, skipna=True), 'Y', 'N')

But that will only take one value for the "ge" condition, whereas I want to check the row value with respect to the dictionary value for each of the columns.

Any suggestions would be appreciated!

>Solution :

Convert your dictionary to Series and perform a simple comparison:

df.ge(pd.Series(checkdict)).replace({True: 'yes', False: 'no'})

output:

  col1 col2 col3
0   no   no  yes
1   no   no  yes
2  yes  yes  yes

To get aggregation per row:

df['any'] = df.ge(pd.Series(checkdict)).any(1).map({True: 'yes', False: 'no'})

output:

   col1  col2  col3  any
0     1     2   3.2  yes
1     2     3   4.2  yes
2     3     4   7.7  yes
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading