Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas special pivot dataframe

Let’s take a sample dataframe :

df = pd.DataFrame({"Name": ["Alan","Alan","Kate","Kate","Brian"],
                   "Shop" :["A","B","C","A","B"],
                   "Amount":[4,2,1,3,5]})
    Name Shop  Amount
0   Alan    A       4
1   Alan    B       2
2   Kate    C       1
3   Kate    A       3
4  Brian    B       5

First expected output :

I would like to create a new dataframe from df having :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  • as columns all the possible values in the column Shop and the column Name
  • as index all the possible values in the column Shop, repeated for each value in column `Name’
  • as values the value in the column Ă€mount matching with the columns Name and Shop

Expected output :

   A  B  C   Name
A  4  2  0   Alan
B  4  2  0   Alan
C  4  2  0   Alan
A  3  0  1   Kate
B  3  0  1   Kate
C  3  0  1   Kate
A  0  5  0  Brian
B  0  5  0  Brian
C  0  5  0  Brian

Second expected output :

It’s almost the same as the first expected output. The only difference is that the value is the one that match with the index (and not column Name) and the column Shop.

Expected output :

   A  B  C   Name
A  4  4  4   Alan
B  2  2  2   Alan
C  0  0  0   Alan
A  3  3  3   Kate
B  0  0  0   Kate
C  1  1  1   Kate
A  0  0  0  Brian
B  5  5  5  Brian
C  0  0  0  Brian

Thanks to this post, I tried several scripts using pivot_table and pivot but I didn’t reach my expected outputs. Would you know please how to do ?

>Solution :

First use DataFrame.pivot with repalce missing values to 0 and for same order like original values use DataFrame.reindex:

v = df['Shop'].unique()
df1 = (df.pivot('Name','Shop','Amount')
         .fillna(0)
         .astype('int')
         .reindex(df['Name'].unique()))

For repeat rows use Index.repeat with DataFrame.loc and set repeated indices by numpy.tile:

df = (df1.assign(Name=df1.index)
         .loc[df1.index.repeat(len(v))]
         .set_index(np.tile(v, len(v)))
         .rename_axis(None, axis=1))
    
print (df)
   A  B  C   Name
A  4  2  0   Alan
B  4  2  0   Alan
C  4  2  0   Alan
A  3  0  1   Kate
B  3  0  1   Kate
C  3  0  1   Kate
A  0  5  0  Brian
B  0  5  0  Brian
C  0  5  0  Brian

For second output reshape by DataFrame.stack with numpy.broadcast_to:

s = df1.stack()
df2 = (pd.DataFrame(np.broadcast_to(s.to_numpy()[:, None], (len(s), len(v))),
                   columns=v, index= np.tile(v, len(v)))
         .assign(Name=s.index.get_level_values(0)))

print (df2)
   A  B  C   Name
A  4  4  4   Alan
B  2  2  2   Alan
C  0  0  0   Alan
A  3  3  3   Kate
B  0  0  0   Kate
C  1  1  1   Kate
A  0  0  0  Brian
B  5  5  5  Brian
C  0  0  0  Brian

Alternative solution:

arr = np.ravel(df1)
df2 = (pd.DataFrame(np.broadcast_to(arr[:, None], (len(arr), len(v))),
                   columns=v, index= np.tile(v, len(v)))
          .assign(Name=np.repeat(df1.index, len(v)))
         )

print (df2)
   A  B  C   Name
A  4  4  4   Alan
B  2  2  2   Alan
C  0  0  0   Alan
A  3  3  3   Kate
B  0  0  0   Kate
C  1  1  1   Kate
A  0  0  0  Brian
B  5  5  5  Brian
C  0  0  0  Brian
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading