Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Value_counts() for each of the columns in one dataframe

Let’s suppose I have a dataset like this:

ID Department Level
1 HR senior
2 IT junior
3 IT middle
4 IT middle
5 IT senior
6 HR middle
7 Design middle
8 Design middle

How do i transform this dataset using value_counts() towards every column, so that end result looks like this:

Department Department_Count Level Level_Count
HR 2 middle 1
senior 1
IT 4 junior 1
middle 2
senior 1
Design 2 middle 2

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can do this by creating 2 groupby dataframes, one for each of the counts, and merging them together.

Department counts:

dept = df.groupby('Department', as_index=False).count()[['Department', 'ID']]
dept = dept.rename(columns = {'ID':'Department_Count'})

  Department  ID
0     Design   2
1         HR   2
2         IT   4

Level counts:

level = df.groupby(['Department', 'Level'], as_index=False).count()
level = level.rename(columns = {'ID':'Level_Count'})

  Department   Level  Level_Count
0     Design  middle            2
1         HR  middle            1
2         HR  senior            1
3         IT  junior            1
4         IT  middle            2
5         IT  senior            1

Then merge the two together on Department

df_out = dept.merge(level, on='Department')

  Department  ID   Level  Level_Count
0     Design   2  middle            2
1         HR   2  middle            1
2         HR   2  senior            1
3         IT   4  junior            1
4         IT   4  middle            2
5         IT   4  senior            1

To get the Nans in the Department and ID columns as requested, you could use .loc to find the duplicated rows in those columns, and replace with Nan (will need to import numpy as np):

df_out.loc[df_out[['Department', 'ID']].duplicated(), ['Department', 'ID']] = np.nan

  Department   ID   Level  Level_Count
0     Design  2.0  middle            2
1         HR  2.0  middle            1
2        NaN  NaN  senior            1
3         IT  4.0  junior            1
4        NaN  NaN  middle            2
5        NaN  NaN  senior            1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading