I have a dataframe like below:
new_df = pd.DataFrame([('231', '2122', '1', 'some_text', 'agent_text_1', 'cust text_1'),
('231', '2682', '2', 'some_text', 'agent_text_2', 'cust text_2'),
('232', '3982', '1', 'some_text', 'agent_text_1', 'cust text_1'),
('233', '1503', '1', 'some_text', 'agent_text_1', 'cust text_1'),
('233', '1692', '2', 'some_text', 'agent_text_2', 'cust text_2'),
('233', '4113', '3', 'some_text', 'agent_text_3', 'cust text_3')
],
columns=['case_number', 'call_id', 'call_order', 'description', 'agent_text', 'cust_text'])
I want to group rows that have same ‘case_number’. I also want to create a new column named ‘all_texts_combined’ that will preserve the texts from different ‘call_order’ after grouping. ‘description’ column is same on same ‘case_number’ rows but ‘agent_text’, ‘cust_text’ differs. The scheme I want to create like below:
grouped_df =
| case_number | all_texts_combined |
|---|---|
| 135 | summary: {description}. Call order: {call_order}. Agent said: {agent_text_1}. Customer said: {cust text_1}. Call order: {call_order}. Agent said: {agent_text_2}. Customer said: {cust text_2}. |
| 136 | summary: {description}. Call order: {call_order}. Agent said: {agent_text_1}. Customer said: {cust text_1}. Call order: {call_order}. Agent said: {agent_text_2}. Customer said: {cust text_2}. Call order: {call_order}. Agent said: {agent_text_3}. Customer said: {cust text_3}. |
Notice the ‘description’ is written only once due to it is same on the same cases.
I have tried the below code but it drops the ‘call_order’ column and can’t preserve the ‘description’ text, let alone keeping it the top of the texts.
grouped_df = new_df.groupby(['case_number', 'call_order']).agg({'agent_text': ' '.join, 'cust_text': ' '.join}).reset_index()
grouped_df = grouped_df .groupby(['case_number']).agg({'agent_text': ' '.join, 'cust_text': ' '.join}).reset_index()
>Solution :
I’d start by creating a new column for all_texts_combined in new_df based on the columns you mentioned.
new_df['all_texts_combined'] = 'Call order: ' + new_df['call_order'] + '. Agent said: ' + new_df['agent_text'] + '. Customer said: ' + new_df['cust_text'] + '. '
Then you can create the grouped DataFrame
grouped_df = new_df.groupby('case_number')['all_texts_combined'].agg(' '.join).to_frame()
To join on the description at the start for each case number:
grouped_df = pd.merge(grouped_df, new_df[['case_number', 'description']].drop_duplicates(), on='case_number')
grouped_df['all_texts_combined'] = 'Summary: ' + grouped_df['description'] + '. ' + grouped_df['all_texts_combined']
del grouped_df['description']
grouped_df
Output
| case_number | all_texts_combined | |
|---|---|---|
| 0 | 231 | Summary: some_text. Call order: 1. Agent said: agent_text_1. Customer said: cust text_1. Call order: 2. Agent said: agent_text_2. Customer said: cust text_2. |
| 1 | 232 | Summary: some_text. Call order: 1. Agent said: agent_text_1. Customer said: cust text_1. |
| 2 | 233 | Summary: some_text. Call order: 1. Agent said: agent_text_1. Customer said: cust text_1. Call order: 2. Agent said: agent_text_2. Customer said: cust text_2. Call order: 3. Agent said: agent_text_3. Customer said: cust text_3. |