Group data in pivot Pandas

August 11, 2022

hello I made a table with the help of a pandas, then I created a pivot with two indexes., I would like to group this data like that, but one index was the header of another. Below I will show what is happening in the tables and what I would like the result
Table:

Name	Lang	Skill	Corp
Michael	java	2	Google
Piter	C++	3	Facebook
Cristiano	python	5	Google
Michael	java	1	Facebook
Piter	C++	2	Google
Cristiano	python	3	Facebook
Michael	java	4	Google
Piter	C++	5	Facebook
Cristiano	python	1	Google
Michael	python	2	Facebook

I used:

pivot = pd.pivot_table(df, values="Skill", index=["Corp", "Name"], columns = "Lang", aggfunc="sum")

and I have pivot:

Corp	Name	C++	java	python
Facebook	Cristiano	nan	nan	3
Facebook	Michael	nan	1	2
Facebook	Piter	8	nan	nan
Google	Cristiano	nan	nan	6
Google	Michael	nan	6	nan
Google	Piter	2	nan	nan

the result I would like:

Name	C++	java	python
Facebook	sum_fb	sum_fb	sum_fb
Cristiano	nan	nan	3
Michael	nan	1	2
Piter	8	nan	nan
Google	sum_google	sum_google	sum_google
Cristiano	nan	nan	6
Michael	nan	6	nan
Piter	2	nan	nan

Thank You in advance

>Solution :

You can aggregate sum by Corp level, which is first by GroupBy.sum, append index to MultiIndex with same values:

df1 = (pivot.groupby(level=0).sum()
            .assign(Name = lambda x: x.index)
            .set_index('Name', append=True))

Or use level=[0,0] for MultiIndex, only necessary set names by DataFrame.rename_axis:

df1 = pivot.groupby(level=[0,0]).sum().rename_axis(['Corp','Name'])
print (df1)         
Lang               C++  java  python
Corp     Name                       
Facebook Facebook  8.0   1.0     5.0
Google   Google    2.0   6.0     6.0

Then is appended pivot DataFrame by concat, but is necessary sorting for correct ordering by first level of MultiIndex with DataFrame.sort_index, remove first level Corp by DataFrame.droplevel and last convert Name to column with remove columns name Lang by DataFrame.rename_axis:

df = (pd.concat([df1, pivot])
        .sort_index(level=0, sort_remaining=False)
        .droplevel(0)
        .reset_index()
        .rename_axis(None, axis=1))
print (df)
        Name  C++  java  python
0   Facebook  8.0   1.0     5.0
1  Cristiano  NaN   NaN     3.0
2    Michael  NaN   1.0     2.0
3      Piter  8.0   NaN     NaN
4     Google  2.0   6.0     6.0
5  Cristiano  NaN   NaN     6.0
6    Michael  NaN   6.0     NaN
7      Piter  2.0   NaN     NaN