Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pyspark trimming all fields bydefault while writing into csv in python

I am trying to write the dataset into csv file using spark 3.3 , Scala 2 python code and bydefault its trimming all the String fields. For example, for the below column values :

" Text123"," jacob "

the output in csv is:

"Text123","jacob"

I dont want to trim any String fields.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Below is my code:

args = getResolvedOptions(sys.argv, ['target_BucketName', 'JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

# Convert DynamicFrame to DataFrame 
df_app = AWSGlueDataCatalog_node.toDF()

# Repartition the DataFrame to control output files APP
df_repartitioned_app = df_app.repartition(10)  

# Check for empty partitions and write only if data is present
if not df_repartitioned_app.rdd.isEmpty():
    df_repartitioned_app.write.format("csv") \
        .option("compression", "gzip") \
        .option("header", "true") \
        .option("delimiter", "|") \
        .save(output_path_app)

>Solution :

set the ignoreLeadingWhiteSpace and ignoreTrailingWhiteSpace options to false. spark by default ignores whitespaces when it’s writing:

    df_repartitioned_app.write.format("csv") \
        .option("compression", "gzip") \
        .option("header", "true") \
        .option("delimiter", "|") \
        .option("ignoreLeadingWhiteSpace", "false") \
        .option("ignoreTrailingWhiteSpace", "false") \
        .save(output_path_app)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading