I need to execute postgressql in pyspark
I tried following:-
spark = SparkSession.builder.appName("ReadFromPostgreSQL").getOrCreate()
url = "jdbc:postgresql://localhost:5432/database_example"
properties = {"user": "postgres", "password": "1234", "driver": "org.postgresql.Driver"}
query = "SELECT * FROM arpan.check_master_planning_family"
jdbcDF = spark.read.jdbc(url=url, table=query, properties=properties)
Here I have replace username, password and url with my server url, username and password.
But I received this error:-
Py4JJavaError:-org.postgresql.util.PSQLException: ERROR: syntax error at or near "SELECT"
This query run perfectly in postgres but here I m getting error. Please Solve this
>Solution :
Change the table=query to table="<table_name>"
spark = SparkSession.builder.appName("ReadFromPostgreSQL").getOrCreate()
url = "jdbc:postgresql://localhost:5432/database_example"
properties = {"user": "postgres", "password": "1234", "driver": "org.postgresql.Driver"}
employee_df = spark.read.jdbc(url=url, table="employee", properties=properties)
department_df = spark.read.jdbc(url=url, table="department", properties=properties)
1. Using DataFrame
employee_df.join(department_df, employee_df.dept_id == department_df.id, "inner").show()
2. Using Spark SQL
Create a temp view out of this DataFrame
employee_df.createOrReplaceTempView("employee")
department_df.createOrReplaceTempView("department")
Now use spark.sql to write a code in SQL format:
spark.sql('''
SELECT * FROM employee AS emp
JOIN department AS dept
ON emp.dept_id = dept.id
''').show()