Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to specify different types of DataFrames in python?

Let’s say that I have a Pyspark DataFrame which I consider is "Users".
Then I have another one which I consider "Cars".

Now lets say that I have a function which return a dataframe of type "Cars".

Usually I see code like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

def get_cars() -> Dataframe:
    pass

However "Dataframe" is not very expressive….is too generic. So, is it possible to specify something like this using alias or similar?:

def get_data() -> Cars: 
    pass

>Solution :

You could use type :

builtins.type now supports subscripting ([]). See PEP 585 and Generic Alias Type.

Source : [docs]

Cars = type[pd.DataFrame]

def get_data() -> Cars:
    ...
    
print(Cars) # type[pandas.core.frame.DataFrame]

From the comments :

.. this type hint won’t say anything about the columns that this Cars Dataframe contain.

In this case, you may be tempted to use pandera (which also supports PySpark SQL):

#pip install pandera
import pandera as pa
                          
Cars = pa.DataFrameSchema({
    "Model": pa.Column(pa.String),
    "Year": pa.Column(pa.Int),
})

def get_cars() -> Cars :
    return pd.DataFrame({
        "Model": ["Lambo", "Porshe", "Mustang"],
        "Year": [2023, 2000, 2010],
    })

Output :

print(Cars.dtypes) # {'Model': DataType(str), 'Year': DataType(int64)}

If you need to validate the schema, you can try this :

df = get_cars()

try:
    Cars.validate(df.astype({"Year": "float"}))
except pa.errors.SchemaError as e:
    print(f"WRONG SCHEMA: {e}")

# WRONG SCHEMA: expected series 'Year' to have type int64, got float64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading