I work with Pandas dataframe.I want to aggregate data by one column and after that to summarize other columns.You can see example below:
data = {'name': ['Company1', 'Company2', 'Company1', 'Company2', 'Company5'],
'income': [0, 180395, 4543168, 7543168, 73],
'turnover': [4, 24, 31, 2, 3]}
df = pd.DataFrame(data, columns = ['name', 'income', 'turnover'])
df
INCOME_GROUPED = df.groupby(['name']).agg({'income':sum,'turnover':sum})
So this code above work well and give good result. Now next step is selection. I want to select only to columns from INCOME_GROUPED dataframe.
INCOME_SELECT = INCOME_GROUPED[['name','income']]
But after execution this line of code I got this error:
"None of [Index(['name', 'income'], dtype='object')] are in the [columns]"
So can anybody help me how to solve this problem ?
>Solution :
You need to call reset_index()
after agg()
:
INCOME_GROUPED = df.groupby(['name']).agg({'income':sum,'turnover':sum}).reset_index()
# ^^^^^^^^^^^^^^ add this
Output:
>>> INCOME_GROUPED[['name', 'income']]
name income
0 Company1 4543168
1 Company2 7723563
2 Company5 73