Home Show percentage of total in pandas pivot table with multiple columns based on single field in dataframe

Questions

Show percentage of total in pandas pivot table with multiple columns based on single field in dataframe

October 19, 2023

Example dataframe, also available as a fiddle:

import pandas as pd

d = {
  "year": [2021, 2021, 2021, 2021, 2022, 2022, 2022, 2023, 2023, 2023, 2023],
  "type": ["A", "B", "B", "A", "A", "B", "A", "B", pd.NA, "B", "A"],
  "observation": [22, 11, 67, 44, 2, 16, 78, 9, 10, 11, 45]
}
df = pd.DataFrame(d)

df_pivot = pd.pivot_table(
  df,
  values="observation",
  index="year",
  columns="type",
  aggfunc="count"
)

The pivot table produces the desired output by count (this is intentional, I do not want the sum of observations, I want a row count):

>>> print(df_pivot)

type  A  B
year      
2021  2  2
2022  2  1
2023  1  2

However, I would like to show the percentage divided into total for each row by types "A" and "B" (the values of the "type" column in the dataframe). Note that not all rows have a type, some are NA (one is NA in this sample data to illustrate this). It’s fine to ignore these unpopulated values in calculations. This also means that the "total" may be different in each row and is based on the sum of counted values in each type (i.e., count of A + count of B for each year).

I have tried multiple ways but it only seems to work when I isolate each specific type one at a time. I have not been able to figure out how to do it where it has similar output only showing the percentage of the total instead of the count. My lambda functions for aggfunc seem to result in incorrect values not reflective of the correct percentages.

Example desired output:

>>> print(df_desired_output)

type     A     B
year      
2021  0.50  0.50
2022  0.66  0.33
2023  0.33  0.66

How do I get this desired output?

>Solution :

Division should do it:

df_pivot.div(df_pivot.sum(1),axis=0).round(2)
type     A     B
year            
2021  0.50  0.50
2022  0.67  0.33
2023  0.33  0.67

pandas

byMR

Published October 19, 2023

Add a comment

Which framework to choose to create desktop application in .NET?

byMR

October 19, 2023

Questions

Linux – How to diff all files which have same name but different extensions in a directory

byMR

October 19, 2023

Questions

How to Implement Trie Delete Function Without Overlapping Error

byMR

October 19, 2023

Questions

R How to calculate percentage depending on the 2 different categorical variables?

byMR

October 19, 2023

Questions

Iterate over different lists

byMR

October 19, 2023

Questions

Flutter generate list of widget in ternary operator prompt "Expected an Identifier"

byMR

October 19, 2023

Show percentage of total in pandas pivot table with multiple columns based on single field in dataframe

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Which framework to choose to create desktop application in .NET?

Linux – How to diff all files which have same name but different extensions in a directory

How to Implement Trie Delete Function Without Overlapping Error

R How to calculate percentage depending on the 2 different categorical variables?

Iterate over different lists

Flutter generate list of widget in ternary operator prompt "Expected an Identifier"

Keep Up to Date with the Most Important News

Show percentage of total in pandas pivot table with multiple columns based on single field in dataframe

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Which framework to choose to create desktop application in .NET?

Linux – How to diff all files which have same name but different extensions in a directory

How to Implement Trie Delete Function Without Overlapping Error

R How to calculate percentage depending on the 2 different categorical variables?

Iterate over different lists

Flutter generate list of widget in ternary operator prompt "Expected an Identifier"

Discover more from Dev solutions