Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Set a value to true if there are specific values in the row that are NULL?

I’m trying to set the values in my indicator column ‘match’ equal to TRUE if the values in the ‘State_db’, ‘Year_db’, and ‘Type_db’ columns are all NULL. Another condition my code applies is checking if ‘State_j’== ‘State_db’, ‘Year_j’==’Year_db’, and ‘Type_j’==’Type_db’, but this works just fine.

This is what my dataframe looks like:

State_j Year_j  Type_j  File                  State_db   Year_db     Type_db                    
NY      2020    ICF     2020_ICF_R2_NY_01     NaN        NaN         NaN
WA      2020    OP      2020_OP_R10_WA_01     WA         2020.0      IP
WA      2020    IP      2020_IP_R10_WA_01     WA         2020.0      IP
NJ      2021    ICF     2021_ICF_R2_NJ_01     NY         2022.0      ICF
CA      2021    PRTF    2021_PRTF_R2_CA_02    CA         2021.0      NaN
MI      2021    PRTF    2021_PRTF_R2_MI_02    NaN        2021.0      NaN

State_match     Year_match     Type_match
False           False          False
True            True           False
True            True           True
False           False          True
True            True           False
False           True           False

This is what I want the final dataframe to look like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

State   Year  Type  File                 State_db   Year_db    Type_db 
NY      2020  ICF   2020_ICF_R2_NY_01    NaN        NaN        NaN   
WA      2020  OP    2020_OP_R10_WA_01    WA         2020.0     IP       
WA      2020  IP    2020_IP_R10_WA_01    WA         2020.0     IP       
NJ      2021  ICF   2021_ICF_R2_NJ_01    NY         2022.0     ICF      
CA      2021  PRTF  2021_PRTF_R2_CA_02   CA         2021.0     NaN      
MI      2021  PRTF  2021_PRTF_R2_MI_02   NaN        2021.0     NaN

State_match     Year_match     Type_match     match
False           False          False          True
True            True           False          False
True            True           True           True
False           False          True           False
True            True           False          False
False           True           False          False  

This is my code:

# Create columns to check for matching state, year, and service type cols
def assemble_report(df):  
    matches_cols = []
    for col in ['State', 'Year', 'Type']:
        df[f'{col}_match'] = (df[f'{col}_j'] == df[f'{col}_db'])  
        matches_cols.append(f'{col}_match') 

    # Create indicator column for whether all keys matched
    df['match'] = df[matches_cols].all(axis='columns')
    df.loc[(df[f'{col}_db'].isnull().all(axis='columns')), 'match'] = "True"
    
    return matches_cols

Most of the code works as properly, but it’s this bit that’s giving me issues:

df.loc[(df[f'{col}_db'].isnull().all(axis='columns')), 'match'] = "True"

Basically, I want the indicator value ‘match’ to equal TRUE if the values in ‘State_db’, ‘Year_db’, and ‘Type_db’ are all NULL.

I’m fairly certain the piece of code above is triggering this error:

ValueError: No axis named columns for object type <class 'type'>

What I’m doing wrong and how can I fix this?

>Solution :

In your solution need match all db columns, so create list cols_db, so possible select all columns in df[cols_db]:

def assemble_report(df):  
    
    L =['State', 'Year', 'Type']
    cols_match = [f'{col}_match' for col in L]
    cols_db = [f'{col}_db' for col in L]

    for col in L:
        df[f'{col}_match'] = (df[f'{col}_j'] == df[f'{col}_db'])  


    # Create indicator column for whether all keys matched
    df['match'] = df[cols_match].all(axis='columns') 
    df.loc[df[cols_db].isnull().all(axis='columns'), 'match'] = True

    #Better alternative match both mask by | for bitwise OR
    #df['match'] = (df[cols_match].all(axis='columns') |
    #               df[cols_db].isnull().all(axis='columns'))
    
    return df

print (assemble_report(df))
  State_j  Year_j Type_j                File State_db  Year_db Type_db  \
0      NY    2020    ICF   2020_ICF_R2_NY_01      NaN      NaN     NaN   
1      WA    2020     OP   2020_OP_R10_WA_01       WA   2020.0      IP   
2      WA    2020     IP   2020_IP_R10_WA_01       WA   2020.0      IP   
3      NJ    2021    ICF   2021_ICF_R2_NJ_01       NY   2022.0     ICF   
4      CA    2021   PRTF  2021_PRTF_R2_CA_02       CA   2021.0     NaN   
5      MI    2021   PRTF  2021_PRTF_R2_MI_02      NaN   2021.0     NaN   

   State_match  Year_match  Type_match  match  
0        False       False       False   True  
1         True        True       False  False  
2         True        True        True   True  
3        False       False        True  False  
4         True        True       False  False  
5        False        True       False  False  

Another idea for improve solution is create all lists and compare values of j with db columns, only necessary convert to numpy arrays (because different columns names j and db):

def assemble_report(df):  
    
    L =['State', 'Year', 'Type']
    cols_match = [f'{col}_match' for col in L]
    cols_db = [f'{col}_db' for col in L]
    cols_j = [f'{col}_j' for col in L]
    
    df[cols_match] = (df[cols_j] == df[cols_db].to_numpy())  

    # Create indicator column for whether all keys matched
    df['match'] = df[cols_match].all(axis=1) | df[cols_db].isnull().all(axis=1)
    
    return df

print (assemble_report(df))
  State_j  Year_j Type_j                File State_db  Year_db Type_db  \
0      NY    2020    ICF   2020_ICF_R2_NY_01      NaN      NaN     NaN   
1      WA    2020     OP   2020_OP_R10_WA_01       WA   2020.0      IP   
2      WA    2020     IP   2020_IP_R10_WA_01       WA   2020.0      IP   
3      NJ    2021    ICF   2021_ICF_R2_NJ_01       NY   2022.0     ICF   
4      CA    2021   PRTF  2021_PRTF_R2_CA_02       CA   2021.0     NaN   
5      MI    2021   PRTF  2021_PRTF_R2_MI_02      NaN   2021.0     NaN   

   State_match  Year_match  Type_match  match  
0        False       False       False   True  
1         True        True       False  False  
2         True        True        True   True  
3        False       False        True  False  
4         True        True       False  False  
5        False        True       False  False  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading