I am working on an application that needs to provide the count of certain entries in a dataframe. Am missing something that its not rendering the required outcome. Please help.
Input:
| Release | Mapping | Coding |
|-----------|---------|--------|
| release_a | A1 | C2 |
| release_c | A1 | C2 |
| release_a | A1 | C2 |
| release_a | A1 | C1 |
| release_b | B | C1 |
| release_c | B | C2 |
| release_c | B | C3 |
| release_a | C | C1 |
| release_c | A1 | C1 |
| release_c | A1 | C3 |
| release_a | C | C1 |
Outcome expected:
| Release | Mapping |
|-----------|--------------|
| release_a | A1 - 3, C-2 |
| release_b | B-1 |
| release_c | A1 -3, B - 2 |
Code used:
df.groupby(['Release', 'Mapping'])['Coding'].agg(count='count')
What i am getting:
May be i havent got a thorough understanding to use agg method. If there is any better alternative also, please suggest. Thanks
>Solution :
Try with groupby and apply methods
df.groupby(['Release', 'Mapping']).size().reset_index(name='Count').groupby('Release').apply(lambda x: ', '.join(f"{row['Mapping']} - {row['Count']}" for _, row in x.iterrows())).reset_index(name='Mapping')
