i am currently working on amazon products dataset and want to fill NaNs in column named "amazon_category_and_sub_category". I want to do it with modes of categories for each manufacturer:
modes = X_train.groupby(by="manufacturer["amazon_category_and_sub_category"].apply(lambda x : np.nan if pd.Series.mode(x).size == 0 else pd.Series.mode(x))
I calculate these modes based on
X_train values, but now I want to do the same thing for
X_test. As i understand i should use modes from
X_train values. Before i do that, i need to check if there is a new manufacturer in test sample :
nans_test = X_test["amazon_category_and_sub_category"].isna() nans_test = X_test.loc[nans_test, "manufacturer"].isin(modes.index)
After that, when I try to set values for
nans_test mask :
X_test.loc[nans_test, "amazon_category_and_sub_category"] = modes[X_test.loc[nans_test, ["manufacturer"]]].to_numpy()
I get an error:
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
Can you explain pls why is this happening and how to fix it?
UPD: I want to first fill NaNs with modes, where it is possible and later define a value for NaN’s which left
I checked indeces for both
nans_test but they look the same way.
Tried to google an error but it feels that each situation has it’s own special bug in code
I think you need chain both conditions tested by
& for bitwise
AND and for mapping use
m1 = X_test["amazon_category_and_sub_category"].isna() m2 = X_test["manufacturer"].isin(modes.index) nans_test = m1 & m2 X_test.loc[nans_test, "amazon_category"] = X_test.loc[nans_test, "manufacturer"].map(modes)