My dataframe looks like this:
| col1 | col2 | col3 |
| ---- | ---- | ---- |
| 1 | abc | txt1 |
| 1 | abc | txt2 |
| 2 | abc | txt3 |
| 1 | xyz | txt4 |
| 2 | xyz | txt5 |
I want to merge the text in col3 between rows only if the rows have the same value in col1 AND the rows have same value in col2.
Expected result:
| col1 | col2 | col3 |
| ---- | ---- | ---------- |
| 1 | abc | txt1, txt2 |
| 2 | abc | txt3 |
| 1 | xyz | txt4 |
| 2 | xyz | txt5 |
I have used this:
df = df.groupby([df[col1], df[col2]]).aggregate({'col3': ', '.join})
Which joins the col3 correctly, but it also merges col1 and col2 into one column (list). How can I achieve the expected result while keeping 3 separate columns (col1, col2, col3)?
>Solution :
A possible solution, which:
-
Performs a group-by operation using two columns,
col1andcol2, as the grouping keys. -
It then aggregates the values in
col3for each group by applying a lambda function that concatenates the values into a single string, with each value separated by a comma.
(df.groupby(['col1', 'col2'], as_index=False)
.agg({'col3': lambda x: ', '. join(x)}))
Output:
col1 col2 col3
0 1 abc txt1, txt2
1 1 xyz txt4
2 2 abc txt3
3 2 xyz txt5