Trying to Read CSV Files in PySpark but it is also reading Text Files

I have a folder having .txt and .csv files (having exactly same column names)

However, while I am trying to read only CSV Files in PySpark and trying the following code below it is reading and appending both text and csv files together

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("CSV Reader").getOrCreate()

csv_path = "path/to/csv/folder"

df = spark.read \
.format("csv") \
.option("header", "true") \
.option("inferSchema", "true") \
.load(csv_path)

>Solution :

You can use pathGlobFilter as an option and define a pattern to read only .csv files

spark.read.format("csv").option('pathGlobFilter', '*.csv').load(csv_path)

Hope this is going to help
I’ve found that option here: https://dbmstutorials.com/pyspark/spark-read-write-dataframe-options.html

Leave a Reply