Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Reading xml files from S3 bucket in Python – Only the content of the last file is getting stored

I have 4 XML files inside the S3 bucket directory. When I’m trying to read the content of all the files, I find that only the content of the last file (XML4) is getting stored.

s3_bucket_name='test'
bucket=s3.Bucket(s3_bucket_name)
bucket_list = []
for file in bucket.objects.filter(Prefix = 'auto'):
    file_name=file.key
    if file_name.find(".xml")!=-1:
        bucket_list.append(file.key)

In the ‘bucket_list’, I can see that there are 4 files

for file in bucket_list:
    obj = s3.Object(s3_bucket_name,file)
    data = (obj.get()['Body'].read())
    
    
tree = ET.ElementTree(ET.fromstring(data))

What changes should be made in the code to read the content of all the XML files?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

As mentioned, since you have a list of files, you need a corresponding list of trees.

tree_list = []

for file in bucket_list:
    obj = s3.Object(s3_bucket_name,file)
    data = (obj.get()['Body'].read())
    tree_list.append(ET.ElementTree(ET.fromstring(data)))

Then you can start using tree_list for whatever purpose.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading