How to refrence a custom class in pandas query

I want to (as part of my query) instantiate a custom class then use that in the query. In the bellow example i will use a trivial identity class that does nothing.

import pandas as pd

class foo:
    def __init__(self, var):
        self.v = var

# this is meant to turn every row in col1 into a foo object then read the v attribute then compare it to 1. I know this is silly in this case, but it is a minimal working example.
q = '@foo(col1).v > 1'

df = pd.DataFrame({'col1':[1,2]})
df.query(q)

When I run this I get an error that the resolver could not find ‘foo’. Specifically:

KeyError                                  Traceback (most recent call last)
File ~/kits/miniconda3/envs/dev/lib/python3.10/site-packages/pandas/core/computation/scope.py:198, in Scope.resolve(self, key, is_local)
    197 if self.has_resolvers:
--> 198     return self.resolvers[key]
    200 # if we're here that means that we have no locals and we also have
    201 # no resolvers

File ~/kits/miniconda3/envs/dev/lib/python3.10/collections/__init__.py:982, in ChainMap.__getitem__(self, key)
    981         pass
--> 982 return self.__missing__(key)

File ~/kits/miniconda3/envs/dev/lib/python3.10/collections/__init__.py:974, in ChainMap.__missing__(self, key)
    973 def __missing__(self, key):
--> 974     raise KeyError(key)

KeyError: 'foo'

I also tried passing this to the query function using the local or global dict arguments, but both gave me the same answer.

I expected the query to properly instantiate inline and then evaluate the boolean.

>Solution :

Other similar option, create function get_foo(var) and call it:

import pandas as pd


class foo:
    def __init__(self, var):
        self.v = var


def get_foo(var):
    return foo(var)

q = "@get_foo(col1).v > 1"
df = pd.DataFrame({"col1": [1, 2]})

x = df.query(q)
print(x)

Prints:

   col1
1     2

Leave a Reply