How to count the number of distinct multiline index in pandas, only by one of the indices components

I have a dataframe that looks like this:
Input dataframe

I want to find the contribution of each category to the Price(USD) column by day. So far I’ve tried aggregating by Timestamp and Category, with the sum of Price(USD):

df3 = df.groupby(["Timestamp", "Category"]).sum()

Obtaining the following dataset:

Dataset grouped by Timestamp and Category

After this point, I haven’t been able to apply a function to each row to divide each Price(USD) by the sum of all different categories in each day and create a new column with these values.

Ideally, a new column "Percentage" would contain :

Percentage

  1. 0.3/(0.3+0.2+0.1)
  2. 0.2/(0.3+0.2+0.1)
  3. 0.1/(0.3+0.2+0.1)

With the same pattern for the rest of the dataframe.

Thank you

>Solution :

Seems like you need

>>> df.groupby(["Timestamp", "Category"]).sum() / df.groupby(["Timestamp"]).sum()

Leave a Reply