I would like to add multiple columns programmatically to a dataframe using pre-defined rules. As an example, I would like to add 3 columns to the below dataframe, based on whether or not they satisfy the three rules indicated in code below:
#define dataframe
df1 = pd.DataFrame({"time1": [0, 1, 1, 0, 0],
"time2": [1, 0, 0, 0, 1],
"time3": [0, 0, 0, 1, 0],
"outcome": [1, 0, 0, 1, 0]})
#define "rules" for adding subsequent columns
rule_1 = (df1["time1"] == 1)
rule_2 = (df1["time2"] == 1)
rule_3 = (df1["time3"] == 1)
#add new columns based on whether or not above rules are satisfied
df1["rule_1"] = np.where(rule_1, 1, 0)
df1["rule_2"] = np.where(rule_2, 1, 0)
df1["rule_3"] = np.where(rule_3, 1, 0)
As you can see my approach gets tedious when I need to add 10s of columns – each based on a different "rule" – to a test dataframe.
Is there a way to do this more easily without defining each column manually along with its individual np.where clause? I tried doing something like this, but pandas does not accept this.
rules = [rule_1, rule_2, rule_3]
for rule in rules:
df1[rule] = np.where(rule, 1, 0)
Any ideas on how to make my approach more programmatically efficient?
>Solution :
The solution you provided doesn’t work because you are using the rule element as the new dataframe column for the rule. I would solve it like this:
rules = [rule_1, rule_2, rule_3]
for i, rule in enumerate(rules):
df1[f'rule_{i+1}'] = np.where(rule, 1, 0)