In Spark Scala, I have Dataframe where one of column is struct type (state
column below table)
JobID | State |
---|---|
1 | {"life_cycle_state": "RUNNING", "state_message": "In run"} |
2 | {"life_cycle_state":"INTERNAL_ERROR","state_message":"Notebook not found"} |
Now, I want to filter dataframe with state
column having ERROR.
Line of code which I’ve tried :
val errorJobRunsDF = df.filter(col("state").rlike("ERROR").select("JobID")
and it failed because rlike doesn’t work for struct type with below error:
Cannot resolve ‘
state
RLIKE ‘ERROR” due to data type mismatch: argument 1 requires string type, however, ‘state
‘ is of struct<life_cycle_state:string,state_message:string> type.
Kindly suggest some work arounds.
>Solution :
In Spark, if you have a DataFrame with a struct column, you can search for specific values within the struct using the filter() function along with the appropriate column qualifiers.
from pyspark.sql.functions import col
# Replace 'search_value' with the value you want to search for
search_value = "search_value"
# Filter the DataFrame based on the search value
filtered_df = df.filter(col("myStruct.field1") == search_value)
# Show the filtered DataFrame
filtered_df.show()