Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Highlight Changes in the same Column in new Dataframe

I have a dataframe as such –

| ID   | Status | Data          | Column |
|:----:|:------:|:-------------:|:-------|
| 1    | A      | Old           |111     |
| 2    | B      | Old           |222     |
| 3    | C      | Old           |333     |
| 2    | C      | New           |222     |
| 3    | D      | New           |333     |
| 1    | E      | New           |111     |
| 4    | E      | New           |444     |
| 4    | E      | Old           |444     |

I want the output to highlight changes based on ID and Data like this , this should be regardless of IDs and Data columns position in row as given in sample above.

So, the result should look like this –

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

| ID   | Change Status | Data          | Final Status| Column | 
|:----:|:-------------:|:-------------:|:-----------:|:------:|
| 1    | A    -> E     | New           | E           | 111    | 
| 2    | B    -> C     | New           | C           | 222    |
| 3    | C    -> D     | New           | D           | 333    |
| 4    | No Change     | New           | E           | 444    |

>Solution :

I would pivot this dataframe to get the changes:

df = df.pivot(index='ID', columns='Data', values='Status')

Data New Old
ID          
1      E   A
2      C   B
3      D   C
4      E   E

You can then add in the other bits to adjust this into the desired format.

To get the 'Change Status' column, you can use numpy’s select, which will assign a different value if the old and new columns match or not, as below:

values = [df['Old'] + ' -> ' + df['New'], 'No Change']
conditions = [df['New'] != df['Old'], df['New'] == df['Old']]
df['Change Status'] = np.select(conditions, values)

Data New Old Change Status
ID                        
1      E   A        A -> E
2      C   B        B -> C
3      D   C        C -> D
4      E   E     No Change

Then the 'Data' column is just populated with ‘New’, so this can be done using:

df['Data'] = 'New'

Data New Old Change Status Data
ID                             
1      E   A        A -> E  New
2      C   B        B -> C  New
3      D   C        C -> D  New
4      E   E     No Change  New

The 'Final Status' column is just the 'New' column, so you can rename this:

df = df.rename(columns={'New':'Final Status'})

Data Final Status Old Change Status Data
ID                                      
1               E   A        A -> E  New
2               C   B        B -> C  New
3               D   C        C -> D  New
4               E   E     No Change  New

Now you can just select the columns you want, in the order above:

df = df[['Change Status', 'Data', 'Final Status']]

Data Change Status Data Final Status
ID                                  
1           A -> E  New            E
2           B -> C  New            C
3           C -> D  New            D
4        No Change  New            E

(If you want ID to be a column not an index, you can just use .reset_index())

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading