Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas groupby use while combining and inserting information on a column based on another column conditions

I have a dataframe like below:

new_df = pd.DataFrame([('231', '2122', '1', 'some_text', 'agent_text_1', 'cust text_1'),
                   ('231', '2682', '2', 'some_text', 'agent_text_2', 'cust text_2'),
                   ('232', '3982', '1', 'some_text', 'agent_text_1', 'cust text_1'),
                   ('233', '1503', '1', 'some_text', 'agent_text_1', 'cust text_1'),
                   ('233', '1692', '2', 'some_text', 'agent_text_2', 'cust text_2'),
                   ('233', '4113', '3', 'some_text', 'agent_text_3', 'cust text_3')
                   ],
                  columns=['case_number', 'call_id', 'call_order', 'description', 'agent_text', 'cust_text'])

I want to group rows that have same ‘case_number’. I also want to create a new column named ‘all_texts_combined’ that will preserve the texts from different ‘call_order’ after grouping. ‘description’ column is same on same ‘case_number’ rows but ‘agent_text’, ‘cust_text’ differs. The scheme I want to create like below:

grouped_df =

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

case_number all_texts_combined
135 summary: {description}. Call order: {call_order}. Agent said: {agent_text_1}. Customer said: {cust text_1}. Call order: {call_order}. Agent said: {agent_text_2}. Customer said: {cust text_2}.
136 summary: {description}. Call order: {call_order}. Agent said: {agent_text_1}. Customer said: {cust text_1}. Call order: {call_order}. Agent said: {agent_text_2}. Customer said: {cust text_2}. Call order: {call_order}. Agent said: {agent_text_3}. Customer said: {cust text_3}.

Notice the ‘description’ is written only once due to it is same on the same cases.

I have tried the below code but it drops the ‘call_order’ column and can’t preserve the ‘description’ text, let alone keeping it the top of the texts.

grouped_df = new_df.groupby(['case_number', 'call_order']).agg({'agent_text': ' '.join, 'cust_text': ' '.join}).reset_index()
grouped_df = grouped_df .groupby(['case_number']).agg({'agent_text': ' '.join, 'cust_text': ' '.join}).reset_index()

>Solution :

I’d start by creating a new column for all_texts_combined in new_df based on the columns you mentioned.

new_df['all_texts_combined'] = 'Call order: ' + new_df['call_order'] + '. Agent said: ' + new_df['agent_text'] + '. Customer said: ' + new_df['cust_text'] + '. '

Then you can create the grouped DataFrame

grouped_df = new_df.groupby('case_number')['all_texts_combined'].agg(' '.join).to_frame()

To join on the description at the start for each case number:

grouped_df = pd.merge(grouped_df, new_df[['case_number', 'description']].drop_duplicates(), on='case_number')
grouped_df['all_texts_combined'] = 'Summary: ' + grouped_df['description'] + '. ' + grouped_df['all_texts_combined']
del grouped_df['description']
grouped_df

Output

case_number all_texts_combined
0 231 Summary: some_text. Call order: 1. Agent said: agent_text_1. Customer said: cust text_1. Call order: 2. Agent said: agent_text_2. Customer said: cust text_2.
1 232 Summary: some_text. Call order: 1. Agent said: agent_text_1. Customer said: cust text_1.
2 233 Summary: some_text. Call order: 1. Agent said: agent_text_1. Customer said: cust text_1. Call order: 2. Agent said: agent_text_2. Customer said: cust text_2. Call order: 3. Agent said: agent_text_3. Customer said: cust text_3.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading