Let’s take a sample dataframe :
df = pd.DataFrame({"Name": ["Alan","Alan","Kate","Kate","Brian"],
"Shop" :["A","B","C","A","B"],
"Amount":[4,2,1,3,5]})
Name Shop Amount
0 Alan A 4
1 Alan B 2
2 Kate C 1
3 Kate A 3
4 Brian B 5
First expected output :
I would like to create a new dataframe from df having :
- as columns all the possible values in the column
Shopand the columnName - as index all the possible values in the column
Shop, repeated for each value in column `Name’ - as values the value in the column
Ă€mountmatching with the columnsNameandShop
Expected output :
A B C Name
A 4 2 0 Alan
B 4 2 0 Alan
C 4 2 0 Alan
A 3 0 1 Kate
B 3 0 1 Kate
C 3 0 1 Kate
A 0 5 0 Brian
B 0 5 0 Brian
C 0 5 0 Brian
Second expected output :
It’s almost the same as the first expected output. The only difference is that the value is the one that match with the index (and not column Name) and the column Shop.
Expected output :
A B C Name
A 4 4 4 Alan
B 2 2 2 Alan
C 0 0 0 Alan
A 3 3 3 Kate
B 0 0 0 Kate
C 1 1 1 Kate
A 0 0 0 Brian
B 5 5 5 Brian
C 0 0 0 Brian
Thanks to this post, I tried several scripts using pivot_table and pivot but I didn’t reach my expected outputs. Would you know please how to do ?
>Solution :
First use DataFrame.pivot with repalce missing values to 0 and for same order like original values use DataFrame.reindex:
v = df['Shop'].unique()
df1 = (df.pivot('Name','Shop','Amount')
.fillna(0)
.astype('int')
.reindex(df['Name'].unique()))
For repeat rows use Index.repeat with DataFrame.loc and set repeated indices by numpy.tile:
df = (df1.assign(Name=df1.index)
.loc[df1.index.repeat(len(v))]
.set_index(np.tile(v, len(v)))
.rename_axis(None, axis=1))
print (df)
A B C Name
A 4 2 0 Alan
B 4 2 0 Alan
C 4 2 0 Alan
A 3 0 1 Kate
B 3 0 1 Kate
C 3 0 1 Kate
A 0 5 0 Brian
B 0 5 0 Brian
C 0 5 0 Brian
For second output reshape by DataFrame.stack with numpy.broadcast_to:
s = df1.stack()
df2 = (pd.DataFrame(np.broadcast_to(s.to_numpy()[:, None], (len(s), len(v))),
columns=v, index= np.tile(v, len(v)))
.assign(Name=s.index.get_level_values(0)))
print (df2)
A B C Name
A 4 4 4 Alan
B 2 2 2 Alan
C 0 0 0 Alan
A 3 3 3 Kate
B 0 0 0 Kate
C 1 1 1 Kate
A 0 0 0 Brian
B 5 5 5 Brian
C 0 0 0 Brian
Alternative solution:
arr = np.ravel(df1)
df2 = (pd.DataFrame(np.broadcast_to(arr[:, None], (len(arr), len(v))),
columns=v, index= np.tile(v, len(v)))
.assign(Name=np.repeat(df1.index, len(v)))
)
print (df2)
A B C Name
A 4 4 4 Alan
B 2 2 2 Alan
C 0 0 0 Alan
A 3 3 3 Kate
B 0 0 0 Kate
C 1 1 1 Kate
A 0 0 0 Brian
B 5 5 5 Brian
C 0 0 0 Brian