I have a pandas dataframe that looks like this:
import pandas as pd
df = pd.DataFrame({'id':[1, 1, 2, 2], 'comp': [-0.10,0.20,-0.10, 0.4], 'word': ['boy','girl','man', 'woman']})
I would like to group the dataframe on id, and calculate the sum of corresponding comp as well as get a new column called n_obs that tracks how many rows(ids) were summed up.
I tried using df.groupby('id').sum() but this is not quite producing the results that I want.
I’d like an output on the below form:
id comp n_obs
1 0.1 2
2 0.3 2
Any suggestions on how I can do this?
>Solution :
You can use .groupby() with .agg():
df.groupby("id").agg(comp=("comp", "sum"), n_obs=("id", "count"))
This outputs:
comp n_obs
id
1 0.1 2
2 0.3 2