Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

In Pandas, how to reference and use a value from a dictionary based on column AND index values in a dataframe?

I’ve data about how my times people are sick in certain locations (location A and B) at certain times (index of dates). I need to divide each value by the population in that location (column) AND at that time (index), which references a separate dictionary.

Eg dataframe:

import pandas as pd
data = [{'A': 1, 'B': 3}, {'A': 2, 'B': 20}, {'A': "Unk", 'B': 50}]
df = pd.DataFrame(data, index=[pd.to_datetime("2019-12-31")
                               , pd.to_datetime("2020-12-30")
                               , pd.to_datetime("2020-12-31")])
Out:
            A   B
2019-12-31  1   3
2020-12-30  2   20
2021-12-31  Unk 50

Population dictionary (location_year):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

dic = {"A_2019": 100, "B_2019": 200, "A_2020": 120, "B_2020": 150}

While it’s not necessary to have the output in the same df, the output I’m trying to achieve would be:

            A   B    A1     B1
2019-12-31  1   3   0.01    0.015
2020-12-30  2   20  0.017   0.133
2021-12-31  Unk 50  nan     0.333

I’ve tried lots of different approaches, but almost always get an unhashable type error.

for col in df.columns:
     df[col + "1"] = df[col]/dic[col + "_" + df.index.strftime("%Y")]
Out: `TypeError: unhashable type: 'Index

I guess I don’t understand how pandas is parsing the df.index value to the dictionary(?). Can this be fixed, or is another approach necessary?

>Solution :

You can create a Series from your dictionary, then unstack to DataFrame, reindex, perform your operation and join:

def split(k):
    x, y = k.split('_')
    return (int(y), x)

# ensure using NaNs for missing values, not strings
df = df.replace('Unk', pd.NA).convert_dtypes()

# reshape to match the original DataFrame
tmp = (pd.Series({split(k): v for k, v in dic.items()})
         .unstack()
         .reindex(df.index.year)
         .set_axis(df.index)
      )

# divide, add_suffix, join
out = df.join(tmp.rdiv(df).add_suffix('1'))

Output:

               A   B        A1        B1
2019-12-31     1   3      0.01     0.015
2020-12-30     2  20  0.016667  0.133333
2020-12-31  <NA>  50      <NA>  0.333333

Intermediate tmp:

              A    B
2019-12-31  100  200
2020-12-30  120  150
2020-12-31  120  150
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading