Home Pandas groupby use while combining and inserting information on a column based on another column conditions

Questions

Pandas groupby use while combining and inserting information on a column based on another column conditions

September 28, 2023

I have a dataframe like below:

new_df = pd.DataFrame([('231', '2122', '1', 'some_text', 'agent_text_1', 'cust text_1'),
                   ('231', '2682', '2', 'some_text', 'agent_text_2', 'cust text_2'),
                   ('232', '3982', '1', 'some_text', 'agent_text_1', 'cust text_1'),
                   ('233', '1503', '1', 'some_text', 'agent_text_1', 'cust text_1'),
                   ('233', '1692', '2', 'some_text', 'agent_text_2', 'cust text_2'),
                   ('233', '4113', '3', 'some_text', 'agent_text_3', 'cust text_3')
                   ],
                  columns=['case_number', 'call_id', 'call_order', 'description', 'agent_text', 'cust_text'])

I want to group rows that have same ‘case_number’. I also want to create a new column named ‘all_texts_combined’ that will preserve the texts from different ‘call_order’ after grouping. ‘description’ column is same on same ‘case_number’ rows but ‘agent_text’, ‘cust_text’ differs. The scheme I want to create like below:

grouped_df =

case_number	all_texts_combined
135	summary: {description}. Call order: {call_order}. Agent said: {agent_text_1}. Customer said: {cust text_1}. Call order: {call_order}. Agent said: {agent_text_2}. Customer said: {cust text_2}.
136	summary: {description}. Call order: {call_order}. Agent said: {agent_text_1}. Customer said: {cust text_1}. Call order: {call_order}. Agent said: {agent_text_2}. Customer said: {cust text_2}. Call order: {call_order}. Agent said: {agent_text_3}. Customer said: {cust text_3}.

Notice the ‘description’ is written only once due to it is same on the same cases.

I have tried the below code but it drops the ‘call_order’ column and can’t preserve the ‘description’ text, let alone keeping it the top of the texts.

grouped_df = new_df.groupby(['case_number', 'call_order']).agg({'agent_text': ' '.join, 'cust_text': ' '.join}).reset_index()
grouped_df = grouped_df .groupby(['case_number']).agg({'agent_text': ' '.join, 'cust_text': ' '.join}).reset_index()

>Solution :

I’d start by creating a new column for all_texts_combined in new_df based on the columns you mentioned.

new_df['all_texts_combined'] = 'Call order: ' + new_df['call_order'] + '. Agent said: ' + new_df['agent_text'] + '. Customer said: ' + new_df['cust_text'] + '. '

Then you can create the grouped DataFrame

grouped_df = new_df.groupby('case_number')['all_texts_combined'].agg(' '.join).to_frame()

To join on the description at the start for each case number:

grouped_df = pd.merge(grouped_df, new_df[['case_number', 'description']].drop_duplicates(), on='case_number')
grouped_df['all_texts_combined'] = 'Summary: ' + grouped_df['description'] + '. ' + grouped_df['all_texts_combined']
del grouped_df['description']
grouped_df

Output

	case_number	all_texts_combined
0	231	Summary: some_text. Call order: 1. Agent said: agent_text_1. Customer said: cust text_1. Call order: 2. Agent said: agent_text_2. Customer said: cust text_2.
1	232	Summary: some_text. Call order: 1. Agent said: agent_text_1. Customer said: cust text_1.
2	233	Summary: some_text. Call order: 1. Agent said: agent_text_1. Customer said: cust text_1. Call order: 2. Agent said: agent_text_2. Customer said: cust text_2. Call order: 3. Agent said: agent_text_3. Customer said: cust text_3.