Imagine that I have a dataframe like this:
import numpy as np
import pandas as pd
data = pd.DataFrame({"ColA:1":[12, 20, 31, np.nan, np.nan, np.nan],
"ColA:2":[np.nan, np.nan, 28, 78, 23, 25],
"ColB":[np.nan, np.nan, 23, 56, 12, 3],
"ColC:1":[56, 10, 35, 67, np.nan, np.nan],
"ColC:2":[np.nan, 56, 28, 78, 23, np.nan],
"ColC:3":[np.nan, np.nan, np.nan, 43, 17, 8]})
data
Out[6]:
ColA:1 ColA:2 ColB ColC:1 ColC:2 ColC:3
0 12.0 NaN NaN 56.0 NaN NaN
1 20.0 NaN NaN 10.0 56.0 NaN
2 31.0 28.0 23.0 35.0 28.0 NaN
3 NaN 78.0 56.0 67.0 78.0 43.0
4 NaN 23.0 12.0 NaN 23.0 17.0
5 NaN 25.0 3.0 NaN NaN 8.0
I would like to combine these duplicated and triplicated columns that have the same substring but are distinguished by ":" and a following number. I managed to combine these with the code below:
df_combined = data.groupby(lambda x: x.split(':')[0], axis=1).bfill()
df_combined
Out[8]:
ColA:1 ColA:2 ColB ColC:1 ColC:2 ColC:3
0 12.0 NaN NaN 56.0 NaN NaN
1 20.0 NaN NaN 10.0 56.0 NaN
2 31.0 28.0 23.0 35.0 28.0 NaN
3 78.0 78.0 56.0 67.0 78.0 43.0
4 23.0 23.0 12.0 23.0 23.0 17.0
5 25.0 25.0 3.0 8.0 8.0 8.0
And now I need to stay with the first column of those replicated and also change the name for the first string before ":", so the output should be:
ColA ColB ColC
0 12.0 NaN 56.0
1 20.0 NaN 10.0
2 31.0 23.0 35.0
3 78.0 56.0 67.0
4 23.0 12.0 23.0
5 25.0 3.0 8.0
Anyone could help me?
>Solution :
data.rename(columns=lambda x: x.split(':')[0]).groupby(level=0, axis=1).first()
output :
ColA ColB ColC
0 12.0 NaN 56.0
1 20.0 NaN 10.0
2 31.0 23.0 35.0
3 78.0 56.0 67.0
4 23.0 12.0 23.0
5 25.0 3.0 8.0