I could’ve sworn df = pd.get_dummies(df, columns=categorical_cols) used to output binary values (0 and 1). It even says it when I hover over get_dummies.
Buy why my output Boolean (True/False)?
Here is my code:
import pandas as pd
# loading data
script_dir = os.path.dirname(__file__)
data_path = os.path.join(script_dir, "path/to/my/raw/data.csv")
df = pd.read_csv(data_path)
# Sample
feature_cols = ["list", "of", "feature", "cols"]
categorical_cols = ["list", "of", "categorical", "cols"]
target = "target_col"
X = df[feature_cols].copy()
y = df[target]
# Convert categorical columns to category type
# Apply get_dummies
X[categorical_cols] = X[categorical_cols].astype("category")
X = pd.get_dummies(X, columns=categorical_cols)
Expected Output: Dummy variables to be in binary form (0 and 1).
Actual Output: The output contains Booleans (True/False).
Steps taken:
I had to convert it to 0 and 1 by adding dtype=int.
`df = pd.get_dummies(df, columns=categorical_cols, dtype=int)`
My Questions:
- Why is pd.get_dummies() defaulting to Booleans (True/False) instead
of (0/1)? - Is updated in newer versions of Pandas, or am I
doing something wrong? - Should I use
dtype=intall the time?
>Solution :
- Why is pd.get_dummies() defaulting to Booleans (True/False) instead of (0/1)?
I found it in whatsnew/v2.0.0:
Default value of dtype in get_dummies() is changed to bool from uint8 (GH45848)
- Is updated in newer versions of Pandas, or am I doing something wrong?
Exactly, new versions return True, False, old versions 1, 0
- Should I use dtype=int all the time?
Exactly.
