I have two dataframes, df1 has columns with month and year associated with the data and df2 has month (in numbers) as the headers and years as the index values.
I’m then trying to populate a new column in df1 with the appropriate values that correspond to the month/year from df2. I have tried .loc function but not sure if it’s meant to populate a whole column or just return one value at a time.
df1
| other data | month | year |
|---|---|---|
| xyz | 12 | 1966 |
| xyz | 1 | 1997 |
df2
| index | 1 | 2 | 3 | 4 | 5 | …. | 12 |
|---|---|---|---|---|---|---|---|
| 1929 | x | y | z | x | y | …. | z |
| 1930 | x | y | z | x | y | …. | z |
| … | x | y | z | x | y | …. | z |
| 1966 | x | y | z | x | y | …. | z |
| 1997 | x | y | z | x | y | …. | z |
and I want a new column to be added to df1 like this, based on values from df2:
| other data | month | year | df2_value |
|---|---|---|---|
| xyz | 12 | 1966 | z |
| xyz | 1 | 1997 | x |
so far I have been trying this:
df1['df2_value'] = df2.loc[df1['year'],df2['month']]
but I’m getting this key error:
KeyError: "None of [Int64Index([12, 1, 2, 3, 2, 2, 3, 2, 4, 1, 1, 2, 3, 2, 1, 2, 2,\n
2, 2, 2, 12, 3, 1, 2, 12, 1, 2, 11, 3, 1, 2, 1, 3, 12,\n
4, 3, 2, 1, 3, 2, 11, 12, 10, 12, 2, 4, 3, 1, 4, 1, 1,\n
2, 3, 1, 2, 4, 2, 2, 2, 4, 2, 3, 12, 9, 12, 3, 2, 3,\n
1, 2, 3, 11, 11, 4],\n dtype='int64')] are in the [columns]"
I have changed the month and year columns in df1 to object type instead of integer but that didn’t change the error. This is my first time trying to use .loc so could be missing something very obvious or maybe I need to use an entirely different function?
>Solution :
Just stack df2, reset the index and merge
df1.merge(df2.stack().reset_index(),
left_on=['year', 'month'],
right_on=['index', 'level_1'])
other data month year index level_1 0
0 xyz 12 1966 1966 12 z
1 xyz 1 1997 1997 1 x