Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Summing values in column of dataframe according to nonunique values in another column

I am learning pandas, and have come across a super easy problem, but not sure how to solve it. I have a dataframe with 2 columns. One column is categorical with three levels i.e. the values are non-unique, and one column is quantitative. For example,

df = pd.DataFrame({"X": list("AAABBC"), "Y": range(6)})

I want to sum each value in the Y column according to each unique-value in the X column. I.e. I should obtain a dataframe with 2 columns; one column being the unique values in X i.e. ["A","B","C"] and the other column being the summed values in Y corresponding to each level in X i.e.[3,7,5].

This is obviously quite a basic thing to do, but I have tried googling and I couldn’t find the answer, so it’s quite frustrating. I think the answer should be quite simply, probably a one-liner, but I just don’t know the command. I am quite new to pandas, so please go easy 🙂

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You are looking for the .groupby method. Relevant docs:

Usage example:

import pandas as pd

df = pd.DataFrame({"X": list("AAABBC"), "Y": range(6)})

df_by_x = df.groupby("X").sum()

print(df_by_x)

Result:

   Y
X
A  3
B  7
C  5

As I’m sure you can imagine, there are many many other things you can do with grouping in Pandas. There are a handful of different ways to solve the problem you proposed here, although the "one obvious way to do it" would be the .sum() method as above. However, it might be a useful learning exercise to work through some of the other ways you might accomplish the same task.

Another related search term for this operation is "split-apply-combine", but that was somewhat of a short-lived trend in terminology (mostly confined to R users), while "group by" is a long-standing term inherited from SQL.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading