I have a dataset that uses numpy and pandas and looks at employees by ID # and all activity like promotion, termination, job switch, and etc. What I want to do is know how to count the number of changes and group them to the manager.
Here is a sample of the data. For Reference: 1 = Yes 0 = No
ID Date Job_Title ManagerID Status Terminated Job_Change Team_Change
1 May 2022 Sales Rep 7 Active 0 0 0
1 Oct 2022 Sales Consultant 7 Active 0 1 0
1 Jan 2023 Sales Consultant 7 Active 0 0 0
2 Feb 2022 Tech Advisor 3 Active 0 0 0
2 May 2022 Tech Advisor 3 Termed 1 0 0
3 Dec 2021 Sales Supervisor 7 Active 0 0 0
3 Jan 2022 Tech Supervisor 10 Active 0 1 1
3 Feb 2023 Tech Manager 10 Active 0 1 0
What I want the output to look like:
ManagerID Terminated Job_Change Team Change
3 1 0 0
7 0 1 0
10 0 2 1
Is their a way to print this output out without having to create a new dataframe?
>Solution :
You can achieve the desired output by using the Pandas library to group your dataset by the ManagerID and then aggregating the number of events in each category (Terminated, Job_Change, Team_Change).
Here’s a Python script using the Pandas library to do this:
import pandas as pd
# Assuming your data is stored in a DataFrame called 'data'
# Group the data by 'ManagerID' and sum the respective columns
result = data.groupby('ManagerID')[['Terminated', 'Job_Change',
'Team_Change']].sum()
# Reset the index to make 'ManagerID' a column again
result = result.reset_index()
print(result)
This script groups your data by the ManagerID and then calculates the sum of the Terminated, Job_Change, and Team_Change columns. The output will be a new DataFrame with the desired format.