Home Reading xml files from S3 bucket in Python – Only the content of the last file is getting stored

Questions

Reading xml files from S3 bucket in Python – Only the content of the last file is getting stored

May 5, 2022

I have 4 XML files inside the S3 bucket directory. When I’m trying to read the content of all the files, I find that only the content of the last file (XML4) is getting stored.

s3_bucket_name='test'
bucket=s3.Bucket(s3_bucket_name)
bucket_list = []
for file in bucket.objects.filter(Prefix = 'auto'):
    file_name=file.key
    if file_name.find(".xml")!=-1:
        bucket_list.append(file.key)

In the ‘bucket_list’, I can see that there are 4 files

for file in bucket_list:
    obj = s3.Object(s3_bucket_name,file)
    data = (obj.get()['Body'].read())
    
    
tree = ET.ElementTree(ET.fromstring(data))

What changes should be made in the code to read the content of all the XML files?

>Solution :

As mentioned, since you have a list of files, you need a corresponding list of trees.

tree_list = []

for file in bucket_list:
    obj = s3.Object(s3_bucket_name,file)
    data = (obj.get()['Body'].read())
    tree_list.append(ET.ElementTree(ET.fromstring(data)))

Then you can start using tree_list for whatever purpose.