I have a CSV file which I read into a panda dataframe using a multiindex header. Here is an example:
df = pd.DataFrame([[1,5,5,2,3], [2,10,4,20,3]])
df.columns = pd.MultiIndex.from_tuples((("1/1/2024","Store"), ("product code 1", "quantity onhand"), ("product code 1",'quantity sold'), ("product code 2", "quantity onhand"), ("product code 2",'quantity sold')))
df
| 1/1/2024
Store |
product code 1
quantity onhand |
product code 1
quantity sold |
product code 2
quantity onhand |
product code 2
quantity sold |
|---|---|---|---|---|
| 1 | 5 | 5 | 2 | 3 |
| 2 | 10 | 4 | 20 | 3 |
The date in the first column is not needed. I tried to stack, but I need to retain the store as it is. Can this be done using pivot to get the following result?
data = [[1,'product code 1',5,5],[1,'product code 2',3,2],[2,'product code 1',4,10],[2,'product code 2',3,20]]
columns = ['Store','Code','quantity sold','quantity onhand']
df2 = pd.DataFrame(data, columns=columns)
df2
| Store | Code | quantity sold | quantity onhand |
|---|---|---|---|
| 1 | product code 1 | 5 | 5 |
| 1 | product code 2 | 3 | 2 |
| 2 | product code 1 | 4 | 10 |
| 2 | product code 2 | 3 | 20 |
>Solution :
You can temporarily set the first column as index, then stack:
(df.set_index(df.columns[0])
.stack(0)
.rename_axis(['Store', 'Code'])
.reset_index()
)
Output:
Store Code quantity onhand quantity sold
0 1 product code 1 5 5
1 1 product code 2 2 3
2 2 product code 1 10 4
3 2 product code 2 20 3