I have a dataframe, and I have a column called url, what I want is to select all the url which is not containing the word "www.ebay.com", I have tried this:
%python
display(flutten_df.printSchema())
display(flutten_df[flutten_df['url'].str.contains("www.ebay.com")])
it gives me this error:
AnalysisException: Can’t extract value from url#75009: need struct
type but got string;
the schema is :
root
|-- web: string (nullable = true)
|-- url: string (nullable = true)
How to fix this problem please?
>Solution :
You’re trying to use pandas syntax on spark DataFrame.
In Pyspark, flutten_df['url'].str means get struct field str from column url. Thus you got that error saying it can’t extract value from a column which is not a struct.
Use filter with rlike instead:
display(flutten_df.filter(~flutten_df['url'].rlike("www.ebay.com")))