Example dataframe, also available as a fiddle:
import pandas as pd
d = {
"year": [2021, 2021, 2021, 2021, 2022, 2022, 2022, 2023, 2023, 2023, 2023],
"type": ["A", "B", "B", "A", "A", "B", "A", "B", pd.NA, "B", "A"],
"observation": [22, 11, 67, 44, 2, 16, 78, 9, 10, 11, 45]
}
df = pd.DataFrame(d)
df_pivot = pd.pivot_table(
df,
values="observation",
index="year",
columns="type",
aggfunc="count"
)
The pivot table produces the desired output by count (this is intentional, I do not want the sum of observations, I want a row count):
>>> print(df_pivot)
type A B
year
2021 2 2
2022 2 1
2023 1 2
However, I would like to show the percentage divided into total for each row by types "A" and "B" (the values of the "type" column in the dataframe). Note that not all rows have a type, some are NA (one is NA in this sample data to illustrate this). It’s fine to ignore these unpopulated values in calculations. This also means that the "total" may be different in each row and is based on the sum of counted values in each type (i.e., count of A + count of B for each year).
I have tried multiple ways but it only seems to work when I isolate each specific type one at a time. I have not been able to figure out how to do it where it has similar output only showing the percentage of the total instead of the count. My lambda functions for aggfunc seem to result in incorrect values not reflective of the correct percentages.
Example desired output:
>>> print(df_desired_output)
type A B
year
2021 0.50 0.50
2022 0.66 0.33
2023 0.33 0.66
How do I get this desired output?
>Solution :
Division should do it:
df_pivot.div(df_pivot.sum(1),axis=0).round(2)
type A B
year
2021 0.50 0.50
2022 0.67 0.33
2023 0.33 0.67