I’m trying to set the values in my indicator column ‘match’ equal to TRUE if the values in the ‘State_db’, ‘Year_db’, and ‘Type_db’ columns are all NULL. Another condition my code applies is checking if ‘State_j’== ‘State_db’, ‘Year_j’==’Year_db’, and ‘Type_j’==’Type_db’, but this works just fine.
This is what my dataframe looks like:
State_j Year_j Type_j File State_db Year_db Type_db
NY 2020 ICF 2020_ICF_R2_NY_01 NaN NaN NaN
WA 2020 OP 2020_OP_R10_WA_01 WA 2020.0 IP
WA 2020 IP 2020_IP_R10_WA_01 WA 2020.0 IP
NJ 2021 ICF 2021_ICF_R2_NJ_01 NY 2022.0 ICF
CA 2021 PRTF 2021_PRTF_R2_CA_02 CA 2021.0 NaN
MI 2021 PRTF 2021_PRTF_R2_MI_02 NaN 2021.0 NaN
State_match Year_match Type_match
False False False
True True False
True True True
False False True
True True False
False True False
This is what I want the final dataframe to look like:
State Year Type File State_db Year_db Type_db
NY 2020 ICF 2020_ICF_R2_NY_01 NaN NaN NaN
WA 2020 OP 2020_OP_R10_WA_01 WA 2020.0 IP
WA 2020 IP 2020_IP_R10_WA_01 WA 2020.0 IP
NJ 2021 ICF 2021_ICF_R2_NJ_01 NY 2022.0 ICF
CA 2021 PRTF 2021_PRTF_R2_CA_02 CA 2021.0 NaN
MI 2021 PRTF 2021_PRTF_R2_MI_02 NaN 2021.0 NaN
State_match Year_match Type_match match
False False False True
True True False False
True True True True
False False True False
True True False False
False True False False
This is my code:
# Create columns to check for matching state, year, and service type cols
def assemble_report(df):
matches_cols = []
for col in ['State', 'Year', 'Type']:
df[f'{col}_match'] = (df[f'{col}_j'] == df[f'{col}_db'])
matches_cols.append(f'{col}_match')
# Create indicator column for whether all keys matched
df['match'] = df[matches_cols].all(axis='columns')
df.loc[(df[f'{col}_db'].isnull().all(axis='columns')), 'match'] = "True"
return matches_cols
Most of the code works as properly, but it’s this bit that’s giving me issues:
df.loc[(df[f'{col}_db'].isnull().all(axis='columns')), 'match'] = "True"
Basically, I want the indicator value ‘match’ to equal TRUE if the values in ‘State_db’, ‘Year_db’, and ‘Type_db’ are all NULL.
I’m fairly certain the piece of code above is triggering this error:
ValueError: No axis named columns for object type <class 'type'>
What I’m doing wrong and how can I fix this?
>Solution :
In your solution need match all db columns, so create list cols_db, so possible select all columns in df[cols_db]:
def assemble_report(df):
L =['State', 'Year', 'Type']
cols_match = [f'{col}_match' for col in L]
cols_db = [f'{col}_db' for col in L]
for col in L:
df[f'{col}_match'] = (df[f'{col}_j'] == df[f'{col}_db'])
# Create indicator column for whether all keys matched
df['match'] = df[cols_match].all(axis='columns')
df.loc[df[cols_db].isnull().all(axis='columns'), 'match'] = True
#Better alternative match both mask by | for bitwise OR
#df['match'] = (df[cols_match].all(axis='columns') |
# df[cols_db].isnull().all(axis='columns'))
return df
print (assemble_report(df))
State_j Year_j Type_j File State_db Year_db Type_db \
0 NY 2020 ICF 2020_ICF_R2_NY_01 NaN NaN NaN
1 WA 2020 OP 2020_OP_R10_WA_01 WA 2020.0 IP
2 WA 2020 IP 2020_IP_R10_WA_01 WA 2020.0 IP
3 NJ 2021 ICF 2021_ICF_R2_NJ_01 NY 2022.0 ICF
4 CA 2021 PRTF 2021_PRTF_R2_CA_02 CA 2021.0 NaN
5 MI 2021 PRTF 2021_PRTF_R2_MI_02 NaN 2021.0 NaN
State_match Year_match Type_match match
0 False False False True
1 True True False False
2 True True True True
3 False False True False
4 True True False False
5 False True False False
Another idea for improve solution is create all lists and compare values of j with db columns, only necessary convert to numpy arrays (because different columns names j and db):
def assemble_report(df):
L =['State', 'Year', 'Type']
cols_match = [f'{col}_match' for col in L]
cols_db = [f'{col}_db' for col in L]
cols_j = [f'{col}_j' for col in L]
df[cols_match] = (df[cols_j] == df[cols_db].to_numpy())
# Create indicator column for whether all keys matched
df['match'] = df[cols_match].all(axis=1) | df[cols_db].isnull().all(axis=1)
return df
print (assemble_report(df))
State_j Year_j Type_j File State_db Year_db Type_db \
0 NY 2020 ICF 2020_ICF_R2_NY_01 NaN NaN NaN
1 WA 2020 OP 2020_OP_R10_WA_01 WA 2020.0 IP
2 WA 2020 IP 2020_IP_R10_WA_01 WA 2020.0 IP
3 NJ 2021 ICF 2021_ICF_R2_NJ_01 NY 2022.0 ICF
4 CA 2021 PRTF 2021_PRTF_R2_CA_02 CA 2021.0 NaN
5 MI 2021 PRTF 2021_PRTF_R2_MI_02 NaN 2021.0 NaN
State_match Year_match Type_match match
0 False False False True
1 True True False False
2 True True True True
3 False False True False
4 True True False False
5 False True False False