hello I made a table with the help of a pandas, then I created a pivot with two indexes., I would like to group this data like that, but one index was the header of another. Below I will show what is happening in the tables and what I would like the result
Table:
| Name | Lang | Skill | Corp |
|---|---|---|---|
| Michael | java | 2 | |
| Piter | C++ | 3 | |
| Cristiano | python | 5 | |
| Michael | java | 1 | |
| Piter | C++ | 2 | |
| Cristiano | python | 3 | |
| Michael | java | 4 | |
| Piter | C++ | 5 | |
| Cristiano | python | 1 | |
| Michael | python | 2 |
I used:
pivot = pd.pivot_table(df, values="Skill", index=["Corp", "Name"], columns = "Lang", aggfunc="sum")
and I have pivot:
| Corp | Name | C++ | java | python |
|---|---|---|---|---|
| Cristiano | nan | nan | 3 | |
| Michael | nan | 1 | 2 | |
| Piter | 8 | nan | nan | |
| Cristiano | nan | nan | 6 | |
| Michael | nan | 6 | nan | |
| Piter | 2 | nan | nan |
the result I would like:
| Name | C++ | java | python |
|---|---|---|---|
| sum_fb | sum_fb | sum_fb | |
| Cristiano | nan | nan | 3 |
| Michael | nan | 1 | 2 |
| Piter | 8 | nan | nan |
| sum_google | sum_google | sum_google | |
| Cristiano | nan | nan | 6 |
| Michael | nan | 6 | nan |
| Piter | 2 | nan | nan |
Thank You in advance
>Solution :
You can aggregate sum by Corp level, which is first by GroupBy.sum, append index to MultiIndex with same values:
df1 = (pivot.groupby(level=0).sum()
.assign(Name = lambda x: x.index)
.set_index('Name', append=True))
Or use level=[0,0] for MultiIndex, only necessary set names by DataFrame.rename_axis:
df1 = pivot.groupby(level=[0,0]).sum().rename_axis(['Corp','Name'])
print (df1)
Lang C++ java python
Corp Name
Facebook Facebook 8.0 1.0 5.0
Google Google 2.0 6.0 6.0
Then is appended pivot DataFrame by concat, but is necessary sorting for correct ordering by first level of MultiIndex with DataFrame.sort_index, remove first level Corp by DataFrame.droplevel and last convert Name to column with remove columns name Lang by DataFrame.rename_axis:
df = (pd.concat([df1, pivot])
.sort_index(level=0, sort_remaining=False)
.droplevel(0)
.reset_index()
.rename_axis(None, axis=1))
print (df)
Name C++ java python
0 Facebook 8.0 1.0 5.0
1 Cristiano NaN NaN 3.0
2 Michael NaN 1.0 2.0
3 Piter 8.0 NaN NaN
4 Google 2.0 6.0 6.0
5 Cristiano NaN NaN 6.0
6 Michael NaN 6.0 NaN
7 Piter 2.0 NaN NaN