Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

KeyError(key) in get_loc_level after using .transform() or apply()

I have a large grouped data frame with multiple groups where I’m trying to filter rows within each group. To simplify it, I will share a simplified data frame with one group where I’m getting the error. df5 is grouped by "Detail", "ID", "Year"

data2 = {"Year":["2012","2012","2012","2012","2012","2012","2012","2012","2012"],
        "Country":['USA','USA','USA','USA','USA','USA','USA','CANADA',"CANADA"],
         "Country_2": ["", "", "", "", "", "", "", "USA", "USA"],
        "ID":["AF12","A15","BU14","DU157","L12","N10","RU156","DU157","RU156"],
         "Detail":[1,1,1,1,1,1,1,1,1],
         "Second_country_available":[False,False,False,False,False,False,False,True,True],
      
        }
df5 = pd.DataFrame(data2)
df5_true = df5["Second_country_available"] == True
Country_2_gr = df5[df5_true].groupby(["Detail", "ID", "Year"])['Country_2'].agg(
            '|'.join)
Country_2_gr
grouped_df5 = (df5.groupby(["Detail", "ID", "Year"], group_keys=False)['Country'])
filtered = grouped_df5.transform(lambda g: g.str.fullmatch(Country_2_gr[g.name]))
filtered

The error would be:

return (self._engine.get_loc(key), None)
  File "pandas\_libs\index.pyx", line 774, in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc
KeyError: (1, 'A15', '2012')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "packages\pandas\core\indexes\.py", line 3045, in _get_loc_level
    raise KeyError(key) from err
KeyError: (1, 'A15', '2012')

The code is working for most of the cases, so I don’t want to radically change it. I would like to have a fix where in a similar case to the one I showed, the rows would be dropped.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Country_2_gr is based on filtered dataframe, so it will not have all the keys, you can try switching to get with default:

filtered = grouped_df5.transform(lambda g: g.str.fullmatch(Country_2_gr.get(g.name, default="")))
filtered
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading