I have a Polars dataframe like
df = pl.DataFrame({
"tags": ['{"ref":"@1", "area": "livingroom", "type": "elec"}', '{"ref":"@2", "area": "kitchen"}', '{"ref":"@3", "type": "elec"}'],
"name": ["a", "b", "c"],
})
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬βββββββ
β tags β name β
β --- β --- β
β str β str β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββͺβββββββ‘
β {"ref":"@1", "area": "livingroom", "type": "elec"} β a β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββ€
β {"ref":"@2", "area": "kitchen"} β b β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββ€
β {"ref":"@3", "type": "elec"} β c β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄βββββββ
What I would to do is create a filter function that filters dataframe based on the tags column. Particularly, I would like to only be left with rows where the tags column has an area key and a type key that has a value "elec".
How can I achieve this (ideally using the native expressions API)?
>Solution :
pl.Expr.str.json_path_match can be used to extract the first match of the JSON string with the a suitable path expression.
(
df
.filter(
pl.col("tags").str.json_path_match("$.area").is_not_null(),
pl.col("tags").str.json_path_match("$.type") == "elec",
)
)
shape: (1, 2)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬βββββββ
β tags β name β
β --- β --- β
β str β str β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββͺβββββββ‘
β {"ref":"@1", "area": "livingroom", "type": "elec"} β a β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄βββββββ
More generally, pl.Expr.str.json_decode ca be used to obtain a struct column with the information of the JSON. This struct can be unnested and used for any downstream filtering operation.
(
df
.with_columns(
pl.col("tags").str.json_decode()
)
.unnest("tags")
)
shape: (3, 4)
βββββββ¬βββββββββββββ¬βββββββ¬βββββββ
β ref β area β type β name β
β --- β --- β --- β --- β
β str β str β str β str β
βββββββͺβββββββββββββͺβββββββͺβββββββ‘
β @1 β livingroom β elec β a β
β @2 β kitchen β null β b β
β @3 β null β elec β c β
βββββββ΄βββββββββββββ΄βββββββ΄βββββββ