I want to replace values in columns from P1-6 to P1-95 with values from column family if the value in a particular column > 0. Here is the part of my input df te_matrix:
family P1-6 P1-12 P1-22 P1-25 P1-26 P1-28 P1-88 P1-89 P1-90 P1-92 P1-93 P1-95
Tourist|20 0.712 0.587 0.838 0.675 0.771 0.952 0.746 0.678 0.603 0.728 0.680 0.759
Stowaway|36 0.740 0.809 0.796 0.699 1.000 0.865 0.814 0.888 0.705 0.758 0.759 0.866
Stowaway|4 0.000 0.000 0.000 0.000 0.000 0.000 0.880 0.000 0.000 0.000 0.000 0.000
I write it for one column, P1-6:
te_matrix['P1-6'] = te_matrix.apply(lambda row: row['family'] if row['P1-6'] > 0 else row['P1-6'], axis=1)
and works well:
family P1-6 P1-12 P1-22 P1-25 P1-26 P1-28 P1-88 P1-89 P1-90 P1-92 P1-93 P1-95
Tourist|20 Tourist|20 0.587 0.838 0.675 0.771 0.952 0.746 0.678 0.603 0.728 0.680 0.759
Stowaway|36 Stowaway|36 0.809 0.796 0.699 1.000 0.865 0.814 0.888 0.705 0.758 0.759 0.866
Stowaway|4 0.000 0.000 0.000 0.000 0.000 0.000 0.880 0.000 0.000 0.000 0.000 0.000
but how to use it for all desired columns without writing almost the same code line by line?
I guess to start with:
cols = te_matrix.filter(like='P1-').columns but how to apply it to the previous code?
>Solution :
Don’t use apply, it’s not needed here really inefficient.
Go with boolean indexing:
df.loc[df['P1-6'].gt(0), 'P1-6'] = df['family']
Output:
family P1-6 P1-12 P1-22 P1-25 P1-26 P1-28 P1-88 P1-89 P1-90 P1-92 P1-93 P1-95
0 Tourist|20 Tourist|20 0.587 0.838 0.675 0.771 0.952 0.746 0.678 0.603 0.728 0.680 0.759
1 Stowaway|36 Stowaway|36 0.809 0.796 0.699 1.000 0.865 0.814 0.888 0.705 0.758 0.759 0.866
2 Stowaway|4 0.0 0.000 0.000 0.000 0.000 0.000 0.880 0.000 0.000 0.000 0.000 0.000
If you want to update several columns at once, go with mask:
cols = df.filter(like='P1-').columns
df[cols] = df[cols].mask(df[cols].gt(0), df['family'], axis=0)
Or, if you’re unsure if the input columns only contain numbers, with help of pd.to_numeric:
cols = df.filter(like='P1-').columns
df[cols] = df[cols].mask(df[cols].apply(pd.to_numeric, errors='coerce').gt(0),
df['family'], axis=0)
Output:
family P1-6 P1-12 P1-22 P1-25 P1-26 P1-28 P1-88 P1-89 P1-90 P1-92 P1-93 P1-95
0 Tourist|20 Tourist|20 Tourist|20 Tourist|20 Tourist|20 Tourist|20 Tourist|20 Tourist|20 Tourist|20 Tourist|20 Tourist|20 Tourist|20 Tourist|20
1 Stowaway|36 Stowaway|36 Stowaway|36 Stowaway|36 Stowaway|36 Stowaway|36 Stowaway|36 Stowaway|36 Stowaway|36 Stowaway|36 Stowaway|36 Stowaway|36 Stowaway|36
2 Stowaway|4 0.0 0.0 0.0 0.0 0.0 0.0 Stowaway|4 0.0 0.0 0.0 0.0 0.0