Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

open json files in a loop – formatting problem

I need to open files in my s3 bucket and those are the files:

enter image description here

I want to apply some piece of code on each of them, hence I want to open them in a loop.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

But I have a problem with formatting. The files are between 1 and 999, I cannot loop though range 1, 999 :

for i in range(1,1000):
    file_to_predict = spark.read.json(f"s3a://mu_bucket/company_v20_dl/part-00{i}.gz")

i will be replaced with 1, 2 etc, I would like it to be replaced with 001, 002 etc <- taking three spaces (as the highest is 999 – taking 3 spaces). Do you perhaps know how to deal with such case?

[EDIT]
I am able to open single file without unzipping it:

enter image description here

>Solution :

The files have GZ extension. That’s a common extension for GZip. Whatever is in those zipped files, you need to unzip them first.

Other than that, use {i:03d} for a 3 digit number with leading zeros.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading