Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Show percentage of total in pandas pivot table with multiple columns based on single field in dataframe

Example dataframe, also available as a fiddle:

import pandas as pd

d = {
  "year": [2021, 2021, 2021, 2021, 2022, 2022, 2022, 2023, 2023, 2023, 2023],
  "type": ["A", "B", "B", "A", "A", "B", "A", "B", pd.NA, "B", "A"],
  "observation": [22, 11, 67, 44, 2, 16, 78, 9, 10, 11, 45]
}
df = pd.DataFrame(d)

df_pivot = pd.pivot_table(
  df,
  values="observation",
  index="year",
  columns="type",
  aggfunc="count"
)

The pivot table produces the desired output by count (this is intentional, I do not want the sum of observations, I want a row count):

>>> print(df_pivot)

type  A  B
year      
2021  2  2
2022  2  1
2023  1  2

However, I would like to show the percentage divided into total for each row by types "A" and "B" (the values of the "type" column in the dataframe). Note that not all rows have a type, some are NA (one is NA in this sample data to illustrate this). It’s fine to ignore these unpopulated values in calculations. This also means that the "total" may be different in each row and is based on the sum of counted values in each type (i.e., count of A + count of B for each year).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have tried multiple ways but it only seems to work when I isolate each specific type one at a time. I have not been able to figure out how to do it where it has similar output only showing the percentage of the total instead of the count. My lambda functions for aggfunc seem to result in incorrect values not reflective of the correct percentages.

Example desired output:

>>> print(df_desired_output)

type     A     B
year      
2021  0.50  0.50
2022  0.66  0.33
2023  0.33  0.66

How do I get this desired output?

>Solution :

Division should do it:

df_pivot.div(df_pivot.sum(1),axis=0).round(2)
type     A     B
year            
2021  0.50  0.50
2022  0.67  0.33
2023  0.33  0.67
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading