Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to count values from one column based on the description in other column in pandas?

I am new to programming and Python. Lately, I am learning to use pandas.

What I would like to know

I am wondering what would be the best approach to work only on numbers related to Group II (in the attached DataFrame). I mean e.g. sum all grades for group II and column ‘Project’. Sure it won’t make sense to sum grades, but the data is just for illustration purposes.

I’d be grateful for any advices and suggestions.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My DataFrame

The code attached will generate random numbers (except for the ‘Group’ column) but the DataFrame will always be like that:

     Name Album  Group  Colloquium_1  Colloquium_2  Project
# 0    B  61738     I             5             4        5
# 1    Z  44071     I             5             5        2
# 2    M  87060     I             5             5        5
# 3    L  67974     I             3             5        3
# 4    Z  15617     I             3             2        3
# 5    Z  91872    II             2             4        5
# 6    H  84685    II             4             2        5
# 7    T  17943    II             2             5        2
# 8    L  54302    II             2             5        3
# 9    O  53433    II             5             4        5

Code to generate my DataFrame:

import pandas as pd
import random as rd

def gen_num():
    num = ""
    for i in range(5):
        num += str(rd.randint(0,9))
    return num

names = ['A','B','C','D','E','F','G','H','I','J','K', 'L','M','N','O', \
             'P','R','S','T','W','Z']

list_names = []
list_album = []
list_group = []
list_coll_1 = []
list_coll_2 = []
list_project = []  

num_of_students = 10

for i in range(num_of_students):
    
    list_names.append(rd.choice(names))
    list_album.append(gen_num())
    list_coll_1.append(rd.randint(2, 5))
    list_coll_2.append(rd.randint(2, 5))
    list_project.append(rd.randint(2, 5)) 
    if i < (num_of_students / 2):
        list_group.append('I')
    else:
        list_group.append('II')
        
group = pd.DataFrame(list_names)
group.set_axis(['Name'], axis=1, inplace=True)
group['Album'] = list_album
group['Group'] = list_group
group['Colloquium_1'] = list_coll_1
group['Colloquium_2'] = list_coll_2
group['Project'] = list_project

>Solution :

One solution to this is to filter the DataFrame first:

group[group["Group"] == "II"]["Project"].sum()
#Out: 18

Breaking this up into parts:

First, this part returns a series of bools (True/False) for each row as to whether the values in "Group" are equal to "II":

group["Group"] == "II"
#0    False
#1    False
#2    False
#3    False
#4    False
#5     True
#6     True
#7     True
#8     True
#9     True
#Name: Group, dtype: bool

Next, writing this into group[] returns a filtered dataframe for those rows that are True:

group[group["Group"] == "II"]
#  Name  Album Group  Colloquium_1  Colloquium_2  Project
#5    E  77371    II             4             5        3
#6    N  90525    II             4             3        3
#7    H  89889    II             3             4        5
#8    T  88154    II             3             4        5
#9    E  56176    II             3             2        2

Using ["Project"] on the end returns a pandas Series of the values in the column:

group[group["Group"] == "II"]["Project"]
#5    3
#6    3
#7    5
#8    5
#9    2
#Name: Project, dtype: int64

And lastly .sum() returns the sum of the series (18).

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading