Merge specific rows in pandas Df

June 9, 2022

I have df after read_excel where some of values (from one column, with strings) are divided. How can i merge them back?

for example:
the df i have

{'CODE': ['A', None, 'B', None, None, 'C'],
'TEXT': ['A', 'a', 'B', 'b', 'b', 'C'],
'NUMBER': ['1', None, '2', None, None,'3']}

the df i want

{'CODE': ['A','B','C'],
'TEXT': ['Aa','Bbb','C'],
'NUMBER': ['1','2','3']}

I can’t find the right solution. I tried to import data in different ways but it also did not help

>Solution :

You can forward fill missing values or Nones for groups with aggregate join and first non None value for NUMBER column:

d = {'CODE': ['A', None, 'B', None, None, 'C'],
'TEXT': ['A', 'a', 'B', 'b', 'b', 'C'],
'NUMBER': ['1', None, '2', None, None,'3']}

df = pd.DataFrame(d)


df1 = df.groupby(df['CODE'].ffill()).agg({'TEXT':''.join, 'NUMBER':'first'}).reset_index()
    
print (df1)
  CODE TEXT NUMBER
0    A   Aa      1
1    B  Bbb      2
2    C    C      3

You can generate dictionary:

cols = df.columns.difference(['CODE'])
d1 = dict.fromkeys(cols, 'first')
d1['TEXT'] = ''.join

df1 = df.groupby(df['CODE'].ffill()).agg(d1).reset_index()