How to combine columns if they have the same substring in header name?

June 23, 2023

Imagine that I have a dataframe like this:

import numpy as np
import pandas as pd 

data = pd.DataFrame({"ColA:1":[12, 20, 31, np.nan, np.nan, np.nan],
                     "ColA:2":[np.nan, np.nan, 28, 78, 23, 25],
                     "ColB":[np.nan, np.nan, 23, 56, 12, 3],
                     "ColC:1":[56, 10, 35, 67, np.nan, np.nan],
                    "ColC:2":[np.nan, 56, 28, 78, 23, np.nan],
                    "ColC:3":[np.nan, np.nan, np.nan, 43, 17, 8]})

data
Out[6]: 
   ColA:1  ColA:2  ColB  ColC:1  ColC:2  ColC:3
0    12.0     NaN   NaN    56.0     NaN     NaN
1    20.0     NaN   NaN    10.0    56.0     NaN
2    31.0    28.0  23.0    35.0    28.0     NaN
3     NaN    78.0  56.0    67.0    78.0    43.0
4     NaN    23.0  12.0     NaN    23.0    17.0
5     NaN    25.0   3.0     NaN     NaN     8.0

I would like to combine these duplicated and triplicated columns that have the same substring but are distinguished by ":" and a following number. I managed to combine these with the code below:

df_combined = data.groupby(lambda x: x.split(':')[0], axis=1).bfill()

df_combined
Out[8]: 
   ColA:1  ColA:2  ColB  ColC:1  ColC:2  ColC:3
0    12.0     NaN   NaN    56.0     NaN     NaN
1    20.0     NaN   NaN    10.0    56.0     NaN
2    31.0    28.0  23.0    35.0    28.0     NaN
3    78.0    78.0  56.0    67.0    78.0    43.0
4    23.0    23.0  12.0    23.0    23.0    17.0
5    25.0    25.0   3.0     8.0     8.0     8.0

And now I need to stay with the first column of those replicated and also change the name for the first string before ":", so the output should be:

   ColA  ColB  ColC
0  12.0   NaN  56.0
1  20.0   NaN  10.0
2  31.0  23.0  35.0
3  78.0  56.0  67.0
4  23.0  12.0  23.0
5  25.0   3.0   8.0

Anyone could help me?

>Solution :

data.rename(columns=lambda x: x.split(':')[0]).groupby(level=0, axis=1).first()

output :

   ColA     ColB    ColC
0   12.0    NaN     56.0
1   20.0    NaN     10.0
2   31.0    23.0    35.0
3   78.0    56.0    67.0
4   23.0    12.0    23.0
5   25.0    3.0     8.0