I have a dataframe as such –
| ID | Status | Data | Column |
|:----:|:------:|:-------------:|:-------|
| 1 | A | Old |111 |
| 2 | B | Old |222 |
| 3 | C | Old |333 |
| 2 | C | New |222 |
| 3 | D | New |333 |
| 1 | E | New |111 |
| 4 | E | New |444 |
| 4 | E | Old |444 |
I want the output to highlight changes based on ID and Data like this , this should be regardless of IDs and Data columns position in row as given in sample above.
So, the result should look like this –
| ID | Change Status | Data | Final Status| Column |
|:----:|:-------------:|:-------------:|:-----------:|:------:|
| 1 | A -> E | New | E | 111 |
| 2 | B -> C | New | C | 222 |
| 3 | C -> D | New | D | 333 |
| 4 | No Change | New | E | 444 |
>Solution :
I would pivot this dataframe to get the changes:
df = df.pivot(index='ID', columns='Data', values='Status')
Data New Old
ID
1 E A
2 C B
3 D C
4 E E
You can then add in the other bits to adjust this into the desired format.
To get the 'Change Status' column, you can use numpy’s select, which will assign a different value if the old and new columns match or not, as below:
values = [df['Old'] + ' -> ' + df['New'], 'No Change']
conditions = [df['New'] != df['Old'], df['New'] == df['Old']]
df['Change Status'] = np.select(conditions, values)
Data New Old Change Status
ID
1 E A A -> E
2 C B B -> C
3 D C C -> D
4 E E No Change
Then the 'Data' column is just populated with ‘New’, so this can be done using:
df['Data'] = 'New'
Data New Old Change Status Data
ID
1 E A A -> E New
2 C B B -> C New
3 D C C -> D New
4 E E No Change New
The 'Final Status' column is just the 'New' column, so you can rename this:
df = df.rename(columns={'New':'Final Status'})
Data Final Status Old Change Status Data
ID
1 E A A -> E New
2 C B B -> C New
3 D C C -> D New
4 E E No Change New
Now you can just select the columns you want, in the order above:
df = df[['Change Status', 'Data', 'Final Status']]
Data Change Status Data Final Status
ID
1 A -> E New E
2 B -> C New C
3 C -> D New D
4 No Change New E
(If you want ID to be a column not an index, you can just use .reset_index())