Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to count the number of distinct multiline index in pandas, only by one of the indices components

I have a dataframe that looks like this:
Input dataframe

I want to find the contribution of each category to the Price(USD) column by day. So far I’ve tried aggregating by Timestamp and Category, with the sum of Price(USD):

df3 = df.groupby(["Timestamp", "Category"]).sum()

Obtaining the following dataset:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Dataset grouped by Timestamp and Category

After this point, I haven’t been able to apply a function to each row to divide each Price(USD) by the sum of all different categories in each day and create a new column with these values.

Ideally, a new column "Percentage" would contain :

Percentage

  1. 0.3/(0.3+0.2+0.1)
  2. 0.2/(0.3+0.2+0.1)
  3. 0.1/(0.3+0.2+0.1)

With the same pattern for the rest of the dataframe.

Thank you

>Solution :

Seems like you need

>>> df.groupby(["Timestamp", "Category"]).sum() / df.groupby(["Timestamp"]).sum()
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading