My initial df looks as follows:
import pandas as pd
columns = ["ID", "Position1", "Position2", "Position3", "Duration1", "Duration2", "Duration3", "Costs1", "Costs2", "Costs3"]
values1 = [1, 1, "null", 2, 10, "null", 20, 500, "null", 1500]
values2 = [2, 2, 3, 1, 55, 44, 22, 1000, 2000, 4000]
df = pd.DataFrame([values1, values2], columns=columns)
My goal is to get a dataframe with the following scheme:
| ID | Position | Duration | Costs |
|----|----------|----------|-------|
| 1 | 1 | 10 | 500 |
| 1 | 2 | 20 | 1500 |
| 2 | 1 | 22 | 4000 |
| 2 | 2 | 55 | 2000 |
| 2 | 3 | 44 | 1000 |
I think a good start is the following, but I am not sure how to proceed:
df_melted = df.melt(id_vars="ID")
df_melted = df_melted[df_melted.value != "null"]
df_melted["variable"] = df_melted["variable"].str.replace("(\d+)", "")
getting:
| | ID | variable | value |
|---:|---:|---------:|------:|
| 0 | 1 | Position | 1 |
| 1 | 2 | Position | 2 |
| 3 | 2 | Position | 3 |
| 4 | 1 | Position | 2 |
| 5 | 2 | Position | 1 |
| 6 | 1 | Duration | 10 |
| 7 | 2 | Duration | 55 |
| 9 | 2 | Duration | 44 |
| 10 | 1 | Duration | 20 |
| 11 | 2 | Duration | 22 |
| 12 | 1 | Costs | 500 |
| 13 | 2 | Costs | 1000 |
| 15 | 2 | Costs | 2000 |
| 16 | 1 | Costs | 1500 |
| 17 | 2 | Costs | 4000 |
>Solution :
Code
use wide_to_long
pd.wide_to_long(df, stubnames=['Position', 'Duration', 'Costs'], i='ID', j='val')\
.droplevel(1)[lambda x: x.ne('null').any(axis=1)]\
.sort_values(['ID', 'Position']).reset_index()
output:
ID Position Duration Costs
0 1 1 10 500
1 1 2 20 1500
2 2 1 22 4000
3 2 2 55 1000
4 2 3 44 2000