Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Is there a way to search a struct in a Spark?

In Spark Scala, I have Dataframe where one of column is struct type (state column below table)

JobID State
1 {"life_cycle_state": "RUNNING", "state_message": "In run"}
2 {"life_cycle_state":"INTERNAL_ERROR","state_message":"Notebook not found"}

Now, I want to filter dataframe with state column having ERROR.

Line of code which I’ve tried :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

val errorJobRunsDF = df.filter(col("state").rlike("ERROR").select("JobID")

and it failed because rlike doesn’t work for struct type with below error:

Cannot resolve ‘state RLIKE ‘ERROR” due to data type mismatch: argument 1 requires string type, however, ‘state‘ is of struct<life_cycle_state:string,state_message:string> type.

Kindly suggest some work arounds.

>Solution :

In Spark, if you have a DataFrame with a struct column, you can search for specific values within the struct using the filter() function along with the appropriate column qualifiers.

from pyspark.sql.functions import col

# Replace 'search_value' with the value you want to search for
search_value = "search_value"

# Filter the DataFrame based on the search value
filtered_df = df.filter(col("myStruct.field1") == search_value)

# Show the filtered DataFrame
filtered_df.show()
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading