import pandas as pd
import numpy as np
np.random.seed(99)
rows = 10
df = pd.DataFrame ({'A' : np.random.choice(range(0, 2), rows, replace = True),
'B' : np.random.choice(range(0, 2), rows, replace = True)})
def get_C1(row):
return row.A + row.B
def get_C2(row):
return 'X' if row.A + row.B == 0 else 'Y'
def get_C3(row):
is_zero = row.A + row.B
return "X" if is_zero else "Y"
df = df.assign(C = lambda row: get_C3(row))
Why the get_C2 and get_C3 functions return an error?
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>Solution :
You’re thinking that df.assign, when passed a function, behaves like df.apply with axis=1, which calls the function for each row.
That’s incorrect.
Where the value is a callable, evaluated on df
That means that the function you pass to assign is called on the whole dataframe instead of each individual row.
So, in your function get_C3, the row parameter is not a row at all. It’s a whole dataframe (and should be renamed to df or something else) and so row.A and row.B are two whole columns, rather than single cell values.
Thus, is_zero is a whole column as well, and ... if is_zero ... will not work.