iam getting this error again and again (TypeError: unsupported operand type(s) for -: 'datetime.date' and 'str')

The code from pyspark.sql.functions import from pyspark.sql.functions import * start_date_str = dbutils.widgets.get("startdate") start_date = to_date(lit(start_date_str), ‘yyyy-MM-dd’) end_date = add_months(start_date_str, 12 * 10) end_date_str = (spark .range(1) .select(add_months(start_date, 12 * 10).alias("end_date")) .collect()[0]["end_date"]) date_range = spark.range(0, (end_date_str – start_date_str).days, 1) \ .withColumn("date", date_add(lit(start_date_str), col("id").cast("date"))) dim_time = date_range \ .withColumn("day_of_week", date_format(col("date"), "EEEE")) \ .withColumn("current_day", when(col("date") == current_date(), 1).otherwise(0))… Read More iam getting this error again and again (TypeError: unsupported operand type(s) for -: 'datetime.date' and 'str')

Create a new column in Spark dataframe that is a list of other column values

I have a dataframe called ‘df’ structured as follows ID name lv1 lv2 abb name1 40.34 21.56 bab name2 21.30 67.45 bba name3 32.45 45.44 In Pandas, I can use the following code to create a new column that contains a list of the lv1 and lv2 values cols = [‘lv1’, ‘lv2’] df[‘new_col’] = df[cols].values.tolist()… Read More Create a new column in Spark dataframe that is a list of other column values

Can't extract value from <> need struct type but got string;

I have some nested json that I have parallelized and spat out as a json. A complete record would look like: { "id":"1", "type":"site", "attributes":{ "description":"Number 1 Park", "activeInactive":{ "text":"Active", "colour":"#4CBB17" }, "lastUpdated":"2019-12-05T08:51:39" }, "relationships":{ "region":{ "data":{ "type":"region", "id":"1061", "meta":{ "displayValue":"Park Region" } } } } } However, the data is pending a data cleanse… Read More Can't extract value from <> need struct type but got string;

filter a column using spark databricks dataframe

I have a dataframe, and I have a column called url, what I want is to select all the url which is not containing the word "www.ebay.com", I have tried this: %python display(flutten_df.printSchema()) display(flutten_df[flutten_df[‘url’].str.contains("www.ebay.com")]) it gives me this error: AnalysisException: Can’t extract value from url#75009: need struct type but got string; the schema is :… Read More filter a column using spark databricks dataframe