Follow

Follow

Contact

Home Pandas setting column values to adjacent ones for duplicates found in a key column

Questions

Pandas setting column values to adjacent ones for duplicates found in a key column

byMR

December 22, 2022

Assuming we have the following table:

+---------+------+------+------+------+------+------+------+------+
| COL_ID  | ColB | COLC | COLD | COLE | COLF | COLG | COLH | COLI |
+---------+------+------+------+------+------+------+------+------+
| aa1     |    1 |    1 |      |      |      |      |      |      |
| aa1     | 2    |    1 |      |      |      |      |      |      |
| aa2     |    3 |    1 |      |      |      |      |      |      |
| ab3     |    6 |    2 |      |      |      |      |      |      |
| ab3     |    5 |    2 |      |      |      |      |      |      |
| ab3     |    7 |    1 |      |      |      |      |      |      |
| ab3     |    1 |    1 |      |      |      |      |      |      |
+---------+------+------+------+------+------+------+------+------+

How can we assign the values of duplicates in the adjacent column if a duplicate is found?

+---------+------+------+------+------+------+------+------+------+
| COL_ID  | ColB | COLC | COLD | COLE | COLF | COLG | COLH | COLI |
+---------+------+------+------+------+------+------+------+------+
| aa1     |    1 |    1 |    1 |    1 |    2 |    1 |      |      |
| aa2     |    3 |    1 |      |      |      |      |      |      |
| ab3     |    6 |    2 |    5 |    2 |    7 |    1 |    1 |    1 |
+---------+------+------+------+------+------+------+------+------+

Here is the sample code to generate this table

import pandas as pd
import numpy as np


my_dic =  {'COL_ID': ['aa1', 'aa1', 'aa2', 'ab3','ab3','ab3','ab3'],
'COLB': [1,2,3,6,5,7,1],
'COLC': [1,1,1,2,2,1,1],
'COLD':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'COLF':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'COLG':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'COLH':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
'COLI:':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]}
    
dp = pd.DataFrame(my_dic)

>Solution :

Use:

dp = (dp.set_index('COL_ID')
        .groupby('COL_ID')
        .apply(lambda x: x.stack().reset_index(drop=True))
        .unstack()
        .astype('Int64')
        .add_prefix('Col')
        .reset_index())
print (dp)
  COL_ID  Col0  Col1  Col2  Col3  Col4  Col5  Col6  Col7
0    aa1     1     1     2     1  <NA>  <NA>  <NA>  <NA>
1    aa2     3     1  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
2    ab3     6     2     5     2     7     1     1     1

Another idea:

s = dp.set_index('COL_ID').stack().droplevel(1)
counter = s.groupby('COL_ID').cumcount().to_numpy()
s.index = [s.index, counter]

dp = s.unstack().astype('Int64').add_prefix('Col').reset_index()
print (dp)
  COL_ID  Col0  Col1  Col2  Col3  Col4  Col5  Col6  Col7
0    aa1     1     1     2     1  <NA>  <NA>  <NA>  <NA>
1    aa2     3     1  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
2    ab3     6     2     5     2     7     1     1     1

pandas

byMR

Published December 22, 2022

Add a comment

Leave a ReplyCancel reply

Read more

Questions

Using np.where with a pandas column. How do you fill the column with the previous value until condition is met again

byMR

December 22, 2022

Questions

How can we generalize the constructor with the variable formatting in Javascript?

byMR

December 22, 2022

Questions

Azure IoT Hub routing to Event Hub

byMR

December 22, 2022

Questions

How can I write a composable for filtering items for different data sources

byMR

December 22, 2022

Questions

How to conditional format RANGE of cells based on multiple multiple columns?

byMR

December 22, 2022

Questions

Restricted number of parts and restricted size of parts – in C++

byMR

December 22, 2022