Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Convert DataFrame with 5*5 grid of choice with string of ranges

Score and Score2 column ranges are not working with df.eq(s) because of the input as ranges.

Default values:-

import pandas as pd

df = pd.DataFrame({'Score': ['>285', '285-280', '280-275', '275-260', '<260'],'Grade1': ['A1', 'A2', 'A3', 'A4', 'A5'],'Score2': ['>270', '270-260', '260-250', '250-200', '<200'],'Grade3': ['D1', 'D2', 'D3', 'D4', 'D5'],'Grade4': ['ID1', 'ID2', 'ID3', 'ID4', 'ID5']})

Input:-

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df_input = pd.DataFrame({'Score': [290], 'Grade1': ['A1'], 'Score2': [190], 'Grade3': ['D3'], 'Grade4': ['ID2']})

Expected output:-

df_output = {'Tiers': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}, 'Score': {0: 290, 1: '', 2: '', 3: '', 4: ''}, 'Grade1': {0: '', 1: '', 2: '', 3: '', 4: 'A1'}, 'Score2': {0: '', 1: '', 2: '', 3: '', 4: 190}, 'Grade3': {0: '', 1: '', 2: 'D3', 3: '', 4: ''}, 'Grade4': {0: '', 1: 'ID2', 2: '', 3: '', 4: ''}}

s = df_input.loc[0]

mask = df.eq(s)

mask.to_dict() = {'Score': {0: False, 1: False, 2: False, 3: False, 4: True}, 'Grade1': {0: True, 1: False, 2: False, 3: False, 4: False}, 'Score2': {0: True, 1: False, 2: False, 3: False, 4: False}, 'Grade3': {0: False, 1: False, 2: True, 3: False, 4: False}, 'Grade4': {0: False, 1: True, 2: False, 3: False, 4: False}}

>Solution :

Modification of my previous answer to handle missing values and arbitrary order:

# get df_input as Series
s = df_input.loc[0]

# find identical values
mask = df.eq(s)

# match ranges
def match_range(val, ref):
    # extract the upper value, for the last replace by np.inf
    s = pd.to_numeric(ref.str.extract('[<-](\d+)', expand=False)).fillna(np.inf)
    # define default output as False
    out = np.zeros(len(ref), dtype=bool)
    # find matching position
    order = np.argsort(s)
    match = np.searchsorted(s[order], val)
    if match < len(s):
        out[order[match]] = True
    return out

# apply match_range on columns for which a direct match failed
m = ~mask.any()
mask.loc[:, m] = df.loc[:, m].apply(lambda x: match_range(s[x.name], x))

# generate output
out = (pd.DataFrame(np.where(mask, s, ''),
                    index=np.arange(len(df))+1,
                    columns=df.columns)
         .rename_axis('Tiers').reset_index()
      )

out

Output:

   Tiers Score Grade1 Score2 Grade3 Grade4
0      1   290     A1                     
1      2                               ID2
2      3                         D3       
3      4                 190              
4      5                                  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading