Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas Groupby python

I have a dataset with the names of the countries and some other information such as salary in the file. The problem is that I need to find mean salaries of employees in the file grouped by country_id and city in ranges (0, 5000] (5000, 10000] and (10000, 15000].
I was using this method but the resultant table is not as what I want. Can you help me with that?

df = file.groupby(['country_id',"city"])['salary'].mean().reset_index(name="mean") 
bins = [0, 5000]
df['binned'] = pd.cut(df['mean'], bins)
print(df)

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I think if your range of the salary bin is always 5000, you can create the bin number of each row by using / operator and math.ceil

import math
df = pd.DataFrame({
    'salary':[
        1231, 5000, 7304, 10000, 12302, 15000,
        1001, 4900, 6012, 9123, 11231, 14923
    ],
    'country': [
        'USA','USA','USA','USA','USA','USA',
        'UK','UK','UK','UK','UK','UK'
    ]
})
df['salary_bin_number'] = (df['salary'] / 5000).apply(lambda x: math.ceil(x))
df.head()

salary country salary_bin_number
1231 USA 1
5000 USA 1
7304 USA 2
10000 USA 2
12302 USA 3

With the salary_bin_number, you can create the column name of bin by using below code

df['salary_range_str'] = df['salary_bin_number'].apply(
    lambda bin_number: f'({(bin_number-1) * 5000}-{(bin_number) * 5000}]'
)

Then group by salary_range_str and country to calculate the average salary in each country,salary_range_str.

df = df.groupby(['country', 'salary_range_str'])['salary'].mean().reset_index()

Finally, pivot the column salary_range_str to columns.

df = pd.pivot_table(df, index='country', columns='salary_range_str', values='salary')

Output

country (0-5000] (10000-15000] (5000-10000]
UK 2950.5 13077 7567.5
USA 3115.5 13651 8652
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading