Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to combine columns if they have the same substring in header name?

Imagine that I have a dataframe like this:

import numpy as np
import pandas as pd 

data = pd.DataFrame({"ColA:1":[12, 20, 31, np.nan, np.nan, np.nan],
                     "ColA:2":[np.nan, np.nan, 28, 78, 23, 25],
                     "ColB":[np.nan, np.nan, 23, 56, 12, 3],
                     "ColC:1":[56, 10, 35, 67, np.nan, np.nan],
                    "ColC:2":[np.nan, 56, 28, 78, 23, np.nan],
                    "ColC:3":[np.nan, np.nan, np.nan, 43, 17, 8]})

data
Out[6]: 
   ColA:1  ColA:2  ColB  ColC:1  ColC:2  ColC:3
0    12.0     NaN   NaN    56.0     NaN     NaN
1    20.0     NaN   NaN    10.0    56.0     NaN
2    31.0    28.0  23.0    35.0    28.0     NaN
3     NaN    78.0  56.0    67.0    78.0    43.0
4     NaN    23.0  12.0     NaN    23.0    17.0
5     NaN    25.0   3.0     NaN     NaN     8.0

I would like to combine these duplicated and triplicated columns that have the same substring but are distinguished by ":" and a following number. I managed to combine these with the code below:

df_combined = data.groupby(lambda x: x.split(':')[0], axis=1).bfill()

df_combined
Out[8]: 
   ColA:1  ColA:2  ColB  ColC:1  ColC:2  ColC:3
0    12.0     NaN   NaN    56.0     NaN     NaN
1    20.0     NaN   NaN    10.0    56.0     NaN
2    31.0    28.0  23.0    35.0    28.0     NaN
3    78.0    78.0  56.0    67.0    78.0    43.0
4    23.0    23.0  12.0    23.0    23.0    17.0
5    25.0    25.0   3.0     8.0     8.0     8.0

And now I need to stay with the first column of those replicated and also change the name for the first string before ":", so the output should be:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   ColA  ColB  ColC
0  12.0   NaN  56.0
1  20.0   NaN  10.0
2  31.0  23.0  35.0
3  78.0  56.0  67.0
4  23.0  12.0  23.0
5  25.0   3.0   8.0

Anyone could help me?

>Solution :

data.rename(columns=lambda x: x.split(':')[0]).groupby(level=0, axis=1).first()

output :

   ColA     ColB    ColC
0   12.0    NaN     56.0
1   20.0    NaN     10.0
2   31.0    23.0    35.0
3   78.0    56.0    67.0
4   23.0    12.0    23.0
5   25.0    3.0     8.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading