Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas groupby apply result same size as the original dataframe

Using data.table in R I can get group conditional sum of some column like this:

lirbary(data.table)

dt <- data.table(id=c("A", "A", "B"),
                 amount=c(100, 10, 1),
                 category=c("cat1", "cat2", "cat1"))

dt[, conditional_amount := sum(.SD[category == "cat1"]$amount), by=.(id)]
dt

result:

   id amount category conditional_amount
1:  A    100     cat1                100
2:  A     10     cat2                100
3:  B      1     cat1                  1

What is the equivalent of this in pandas? When I do this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd

df = pd.DataFrame({"id": ["A", "A", "B"],
                   "amount": [100, 10, 1],
                   "category": ["cat1", "cat2", "cat1"]})

g = df.groupby(["id"]).apply(lambda x: x.query("category == 'cat1'")["amount"].sum())

result:

id
A    100
B      1
dtype: int64

I then need to add this back to the original DataFrame:

g = g.reset_index()
g.columns = ["id", "conditional_amount"]
df = df.merge(g, on="id")
df

result:

  id  amount category  conditional_amount
0  A     100     cat1                 100
1  A      10     cat2                 100
2  B       1     cat1                   1

Is there a more succinct way of doing that? It feels like there must be, but because it’s a conditional sum I cannot use .transform here. Thanks!

>Solution :

You can mask the Series before using groupby.transform:

df['conditional_amount'] = (df['amount']
                            .where(df['category'].eq('cat1'))
                            .groupby(df['id'])
                            .transform('sum')
                           )

output:

  id  amount category  conditional_amount
0  A     100     cat1               100.0
1  A      10     cat2               100.0
2  B       1     cat1                 1.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading