Highlight Changes in the same Column in new Dataframe

September 14, 2022

I have a dataframe as such –

| ID   | Status | Data          | Column |
|:----:|:------:|:-------------:|:-------|
| 1    | A      | Old           |111     |
| 2    | B      | Old           |222     |
| 3    | C      | Old           |333     |
| 2    | C      | New           |222     |
| 3    | D      | New           |333     |
| 1    | E      | New           |111     |
| 4    | E      | New           |444     |
| 4    | E      | Old           |444     |

I want the output to highlight changes based on ID and Data like this , this should be regardless of IDs and Data columns position in row as given in sample above.

So, the result should look like this –

| ID   | Change Status | Data          | Final Status| Column | 
|:----:|:-------------:|:-------------:|:-----------:|:------:|
| 1    | A    -> E     | New           | E           | 111    | 
| 2    | B    -> C     | New           | C           | 222    |
| 3    | C    -> D     | New           | D           | 333    |
| 4    | No Change     | New           | E           | 444    |

>Solution :

I would pivot this dataframe to get the changes:

df = df.pivot(index='ID', columns='Data', values='Status')

Data New Old
ID          
1      E   A
2      C   B
3      D   C
4      E   E

You can then add in the other bits to adjust this into the desired format.

To get the 'Change Status' column, you can use numpy’s select, which will assign a different value if the old and new columns match or not, as below:

values = [df['Old'] + ' -> ' + df['New'], 'No Change']
conditions = [df['New'] != df['Old'], df['New'] == df['Old']]
df['Change Status'] = np.select(conditions, values)

Data New Old Change Status
ID                        
1      E   A        A -> E
2      C   B        B -> C
3      D   C        C -> D
4      E   E     No Change

Then the 'Data' column is just populated with ‘New’, so this can be done using:

df['Data'] = 'New'

Data New Old Change Status Data
ID                             
1      E   A        A -> E  New
2      C   B        B -> C  New
3      D   C        C -> D  New
4      E   E     No Change  New

The 'Final Status' column is just the 'New' column, so you can rename this:

df = df.rename(columns={'New':'Final Status'})

Data Final Status Old Change Status Data
ID                                      
1               E   A        A -> E  New
2               C   B        B -> C  New
3               D   C        C -> D  New
4               E   E     No Change  New

Now you can just select the columns you want, in the order above:

df = df[['Change Status', 'Data', 'Final Status']]

Data Change Status Data Final Status
ID                                  
1           A -> E  New            E
2           B -> C  New            C
3           C -> D  New            D
4        No Change  New            E

(If you want ID to be a column not an index, you can just use .reset_index())