Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to match DataFrame index and column against dictionary key and multiple values?

How can I modify the below dictionary comprehension to take into account that column s should also be a matching criteria?

import pandas as pd 

dct = {'NNI' : pd.DataFrame({'s': [-1, -1, -1, 1, 1],
                             'count': [13, 11, 10,12, 16]},
                            index =['2007-07-13', '2019-09-18', '2016-08-01', '2021-04-05','2017-01-04' ]),
       'NVEC' : pd.DataFrame({'s': [-1, -1, -1, 1, 1],
                              'count': [12, 10, 9,14,5]},
                             index =['2012-10-09', '2018-10-01', '2022-02-01', '2020-03-20','2016-04-06'])
      }

df = pd.DataFrame({'Date': ['2022-02-14', '2022-02-14', '2022-02-14', '2022-02-14', '2022-02-14'], 
                   's': [-1,-1,-1,1,1], 
                   'count': [10, 10, 10, 9, 9]}, 
                  index = ['NNI', 'NVEC', 'IPA', 'LYTS', 'MYN'])

df:

            Date  s  count
NNI   2022-02-14 -1     10
NVEC  2022-02-14 -1     10
IPA   2022-02-14 -1     10
LYTS  2022-02-14  1      9
MYN   2022-02-14  1      9

dct:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

{'NNI':       s  count
 2007-07-13  -1     13
 2019-09-18  -1     11
 2016-08-01  -1     10
 2021-04-05   1     12
 2017-01-04   1     16,

 'NVEC':      s  count
 2012-10-09  -1     12
 2018-10-01  -1     10
 2022-02-01  -1      9
 2020-03-20   1     14
 2016-04-06   1      5}

This is what I have so far:

df = df.assign(ratio=pd.Series({k: v['count'].gt(df.loc[k, 'count']).sum() / 
v['count'].ge(df.loc[k, 'count']).sum() for k,v in dct.items()})).fillna(0)

df
            Date  s  count     ratio
NNI   2022-02-14 -1     10  0.800000
NVEC  2022-02-14 -1     10  0.666667
IPA   2022-02-14 -1     10  0.000000
LYTS  2022-02-14  1      9  0.000000
MYN   2022-02-14  1      9  0.000000

Desired result is:

df
            Date  s  count     ratio
NNI   2022-02-14 -1     10  0.666667
NVEC  2022-02-14 -1     10  0.500000
IPA   2022-02-14 -1     10  0.000000
LYTS  2022-02-14  1      9  0.000000
MYN   2022-02-14  1      9  0.000000

>Solution :

You can add it as boolean mask like:

v.loc[v['s'] == df.loc[k, 's'], 'count']

So the code becomes:

df = df.assign(ratio=pd.Series({k: v.loc[v['s'] == df.loc[k, 's'], 'count'].gt(df.loc[k, 'count']).sum() / 
                                v.loc[v['s'] == df.loc[k, 's'], 'count'].ge(df.loc[k, 'count']).sum() 
                                for k,v in dct.items()})).fillna(0)

Output:

            Date  s  count     ratio
NNI   2022-02-14 -1     10  0.666667
NVEC  2022-02-14 -1     10  0.500000
IPA   2022-02-14 -1     10  0.000000
LYTS  2022-02-14  1      9  0.000000
MYN   2022-02-14  1      9  0.000000

Just a suggestion but it might be helpful to use a helper function here because the division is kind of unreadable there especially after adding the indexing. You could use:

def get_ratio(df_row, v):
    msk = v['s'] == df_row['s']
    numerator = v.loc[msk, 'count'].gt(df_row['count']).sum()
    denominator = v.loc[msk, 'count'].ge(df_row['count']).sum()
    return numerator / denominator

df = df.assign(ratio = pd.Series({k: get_ratio(df.loc[k], v) for k,v in dct.items()})).fillna(0)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading