Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to use groupby in Python to merge text while keeping the other rows fixed?

I have the following dataframe:

import pandas as pd

df = pd.DataFrame({'Date':['2022-01-01', '2022-01-01','2022-01-01','2022-02-01','2022-02-01',
                      '2022-03-01','2022-03-01','2022-03-01'],
              'Type': ['R','R','R','P','P','G','G','G'],
              'Class':[1,1,1,0,0,2,2,2],
              'Text':['Hello-','I would like.','to be merged.','with all other.',
                      'sentences that.','belong to my same.','group.','thanks a lot.']})

df.index =[1,1,1,2,2,3,3,3]

What I would like to do is grouping by the index to join the column of the text while keeping only the first row for the other columns.

I tried the following two solutions without success. Probably I should combine them but I have no idea on how to do it.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

# Approach 1
df.groupby([df.index],as_index=False).agg(lambda x : x.sum() if x.dtype=='float64' else ' '.join(x))

# Approach 2
df.groupby([df.index], as_index=False).agg({'Date': 'first',
                    'Type': 'first', 'Class': 'first', 'Test': 'join'})

The outcome should be:


Date          Type   Class   Text
2022-01-01     R      1      Hello. I would like to be merged.
2022-02-01     P      0      with all other sentences that.
2022-03-01     G      2      belong to my same. group. thanks a lot.

Can anyone help me do it?

Thanks!

>Solution :

My idea would be to take the second approach and aggregate the text to a list and then simply join the individual strings like this:

new_df = df.groupby([df.index], as_index=False).agg({'Date': 'first',
                    'Type': 'first', 'Class': 'first', 'Text': list})
new_df['Text'] = new_df['Text'].str.join('')
print(new_df)

Output:


Date    Type    Class   Text
0   2022-01-01  R   1   Hello-I would like.to be merged.
1   2022-02-01  P   0   with all other.sentences that.
2   2022-03-01  G   2   belong to my same.group.thanks a lot.

Found out you can do it in a single statement as well (same approach):

new_df = df.groupby([df.index], as_index=False).agg({'Date': 'first',
                    'Type': 'first', 'Class': 'first', 'Text': ''.join})
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading