I have the following dataset, with multiple IDs, Dates, and other Columns. In one of my columns (Col_X), I have markers 1,2,3, 4, which mark whether a Date is in quarter 1,2,3 or 4, respectively.
| ID | Dates | Col_X | Col_Y |
|---|---|---|---|
| 2038 | 2012-01-02 | 1 | InfoY |
| 2038 | 2012-04-13 | 2 | InfoY |
| 2038 | 2012-02-27 | 1 | InfoY |
| 2120 | 2005-08-05 | 3 | InfoY |
| 2120 | 2009-11-31 | 4 | InfoY |
What I want to do is, I want to add 4 new columns (Q1-Q4), one per quarter, that assign 1 or 0 based on the information that I have in Col_X. To put it more simply, in Q1: I want to have 1 if I have 1 in Col_X, else 0; in Q2: I want to have 1 if I have 2 in Col_X, else 0; etc.
I tried the following which saves the marker in "Col_X" based on "Dates", but I can’t find a way to iterate through Col_X and do one-hot encoding in separate Q1-Q4 columns.
Any ideas, would be greatly appreciated.
>Solution :
You can use pandas.get_dummies after conversion to quarter using pandas.to_datetime+dt.quarter:
NB. I changed 2009-11-31 into 2009-11-30 as the first one is not a valid date
df.join(pd.get_dummies(pd.to_datetime(df['Dates']).dt.quarter).add_prefix('Q'))
output:
ID Dates Col_X Col_Y Q1 Q2 Q3 Q4
0 2038 2012-01-02 1 InfoY 1 0 0 0
1 2038 2012-04-13 2 InfoY 0 1 0 0
2 2038 2012-02-27 1 InfoY 1 0 0 0
3 2120 2005-08-05 3 InfoY 0 0 1 0
4 2120 2009-11-30 4 InfoY 0 0 0 1
edit: simpler method as there is already column ‘Col_X’:
df = pd.get_dummies(df, columns=['Col_X'], prefix='Q', prefix_sep='')