Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas setting column values to adjacent ones for duplicates found in a key column

Assuming we have the following table:

+---------+------+------+------+------+------+------+------+------+
| COL_ID  | ColB | COLC | COLD | COLE | COLF | COLG | COLH | COLI |
+---------+------+------+------+------+------+------+------+------+
| aa1     |    1 |    1 |      |      |      |      |      |      |
| aa1     | 2    |    1 |      |      |      |      |      |      |
| aa2     |    3 |    1 |      |      |      |      |      |      |
| ab3     |    6 |    2 |      |      |      |      |      |      |
| ab3     |    5 |    2 |      |      |      |      |      |      |
| ab3     |    7 |    1 |      |      |      |      |      |      |
| ab3     |    1 |    1 |      |      |      |      |      |      |
+---------+------+------+------+------+------+------+------+------+

How can we assign the values of duplicates in the adjacent column if a duplicate is found?

+---------+------+------+------+------+------+------+------+------+
| COL_ID  | ColB | COLC | COLD | COLE | COLF | COLG | COLH | COLI |
+---------+------+------+------+------+------+------+------+------+
| aa1     |    1 |    1 |    1 |    1 |    2 |    1 |      |      |
| aa2     |    3 |    1 |      |      |      |      |      |      |
| ab3     |    6 |    2 |    5 |    2 |    7 |    1 |    1 |    1 |
+---------+------+------+------+------+------+------+------+------+

Here is the sample code to generate this table

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd
import numpy as np


my_dic =  {'COL_ID': ['aa1', 'aa1', 'aa2', 'ab3','ab3','ab3','ab3'],
'COLB': [1,2,3,6,5,7,1],
'COLC': [1,1,1,2,2,1,1],
'COLD':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'COLF':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'COLG':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'COLH':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'COLI:':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]}
    
dp = pd.DataFrame(my_dic)

>Solution :

Use:

dp = (dp.set_index('COL_ID')
        .groupby('COL_ID')
        .apply(lambda x: x.stack().reset_index(drop=True))
        .unstack()
        .astype('Int64')
        .add_prefix('Col')
        .reset_index())
print (dp)
  COL_ID  Col0  Col1  Col2  Col3  Col4  Col5  Col6  Col7
0    aa1     1     1     2     1  <NA>  <NA>  <NA>  <NA>
1    aa2     3     1  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
2    ab3     6     2     5     2     7     1     1     1

Another idea:

s = dp.set_index('COL_ID').stack().droplevel(1)
counter = s.groupby('COL_ID').cumcount().to_numpy()
s.index = [s.index, counter]

dp = s.unstack().astype('Int64').add_prefix('Col').reset_index()
print (dp)
  COL_ID  Col0  Col1  Col2  Col3  Col4  Col5  Col6  Col7
0    aa1     1     1     2     1  <NA>  <NA>  <NA>  <NA>
1    aa2     3     1  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
2    ab3     6     2     5     2     7     1     1     1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading