Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Generating column mode in data panel

I have an unbalanced data panel where each period a student can have a certain level/type of scholarship:

head(df)

ID     student_period         scholarship
   
4567        1              scholarship_level_1
4567        2              scholarship_level_2
4567        3              scholarship_level_2
4567        4              scholarship_level_3
5478        4              scholarship_level_3
5478        5              scholarship_level_3
6758        7              scholarship_level_1
6758        8              scholarship_level_2
6758        9              scholarship_level_2

Basically, I want to create a new variable that plots the statistical mode of the scholarship level for each student ID in this panel. Something like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

head(df1)

ID     student_period         scholarship            scholarship_mode
   
4567        1              scholarship_level_1      scholarship_level_2
4567        2              scholarship_level_2      scholarship_level_2 
4567        3              scholarship_level_2      scholarship_level_2
4567        4              scholarship_level_3      scholarship_level_2
5478        4              scholarship_level_3      scholarship_level_3
5478        5              scholarship_level_3      scholarship_level_3
6758        7              scholarship_level_1      scholarship_level_2
6758        8              scholarship_level_2      scholarship_level_2
6758        9              scholarship_level_2      scholarship_level_2


Any ideas?

>Solution :

You can use groupby+transform and value_counts:

df['scholarship_mode'] = (df.groupby('ID')['scholarship']
                          .transform(lambda x: x.value_counts().index[0]))

or mode:

df['scholarship_mode'] = (df.groupby('ID')['scholarship']
                          .transform(lambda x: x.mode().iloc[0]))

output:

     ID  student_period          scholarship     scholarship_mode
0  4567               1  scholarship_level_1  scholarship_level_2
1  4567               2  scholarship_level_2  scholarship_level_2
2  4567               3  scholarship_level_2  scholarship_level_2
3  4567               4  scholarship_level_3  scholarship_level_2
4  5478               4  scholarship_level_3  scholarship_level_3
5  5478               5  scholarship_level_3  scholarship_level_3
6  6758               7  scholarship_level_1  scholarship_level_2
7  6758               8  scholarship_level_2  scholarship_level_2
8  6758               9  scholarship_level_2  scholarship_level_2

NB. be aware that mode/value_counts can have ties, in this case only one value will be used.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading