Home PySpark Error: cannot resolve '`timestamp`'

Questions

PySpark Error: cannot resolve '`timestamp`'

November 5, 2021

I have to find the exact hour that most checkins occur in the Yelp dataset but I’m running up against this error for some reason. Here is my code so far:

from pyspark.sql.functions import udf
from pyspark.sql.functions import explode
from pyspark.sql.types import IntegerType
from pyspark.sql.types import ArrayType,StringType
from pyspark.sql import functions as F

square_udf_int = udf(lambda z: square(z), IntegerType())

checkin = spark.read.json('yelp_academic_dataset_checkin.json.gz')
datesplit = udf(lambda x: x.split(','),ArrayType(StringType()))
checkin.select('business_id',datesplit('date').alias('dates')).withColumn('checkin_date',explode('dates'))
datesplit = udf(lambda x: x.split(','),ArrayType(StringType()))
dates = checkin.select('business_id',datesplit('date').alias('dates')).withColumn('checkin_date',explode('dates'))
dates = dates.select("checkin_date")
dates.withColumn("checkin_date", F.date_trunc('checkin_date',
                   F.to_timestamp("timestamp", "yyyy-MM-dd HH:mm:ss 'UTC'"))).show(truncate=0)

And the error:

Py4JJavaError: An error occurred while calling o1112.withColumn.
: org.apache.spark.sql.AnalysisException: cannot resolve '`timestamp`' given input columns: [checkin_date];;
'Project [date_trunc(checkin_date, to_timestamp('timestamp, Some(yyyy-MM-dd HH:mm:ss 'UTC')), Some(Etc/UTC)) AS checkin_date#190]
+- Project [checkin_date#176]
   +- Project [business_id#6, dates#172, checkin_date#176]
      +- Generate explode(dates#172), false, [checkin_date#176]
         +- Project [business_id#6, <lambda>(date#7) AS dates#172]
            +- Relation[business_id#6,date#7] json

dates is just a Spark dataframe with one column named: "checkin_date" with only datetimes, so I’m not sure why this isn’t working.

>Solution :

The error you obtain simply means that in the following line of code, you are trying to access a column named timestamp and that it does not exist.

dates.withColumn("checkin_date", F.date_trunc('checkin_date',
                   F.to_timestamp("timestamp", "yyyy-MM-dd HH:mm:ss 'UTC'")))

Indeed, here is the signature of the to_timestamp function:

pyspark.sql.functions.to_timestamp(col, format=None)

The first argument is the column, the second is the format. I am assuming you are trying to parse a date and then truncate it. Let’s say you want to truncate the date to the month level. The correct way to do it would be:

dates.withColumn("checkin_date", F.date_trunc('month',
                   F.to_timestamp('checkin_date', "yyyy-MM-dd HH:mm:ss 'UTC'")))

pyspark

byMR

Published November 05, 2021

Add a comment

How do I make an if statement in with multiple parameter using excel columns in python csv

byMR

November 5, 2021

Questions

How to insert String variable into the String inside cy.get()

byMR

November 5, 2021

Questions

How to sum all the total in a table row? JS

byMR

November 5, 2021

Questions

Recognize a string containing only digits and "/" symbols in Python

byMR

November 5, 2021

Questions

Bootstrap 5 Popover dynamic text

byMR

November 5, 2021

Questions

C++20 Likely and UnLikely?

byMR

November 5, 2021

PySpark Error: cannot resolve '`timestamp`'

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How do I make an if statement in with multiple parameter using excel columns in python csv

How to insert String variable into the String inside cy.get()

How to sum all the total in a table row? JS

Recognize a string containing only digits and "/" symbols in Python

Bootstrap 5 Popover dynamic text

C++20 Likely and UnLikely?

Keep Up to Date with the Most Important News

PySpark Error: cannot resolve '`timestamp`'

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How do I make an if statement in with multiple parameter using excel columns in python csv

How to insert String variable into the String inside cy.get()

How to sum all the total in a table row? JS

Recognize a string containing only digits and "/" symbols in Python

Bootstrap 5 Popover dynamic text

C++20 Likely and UnLikely?

Discover more from Dev solutions