Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

NameError: name 'filedata' is not defined, when extracting data from the pdf or doc

I’m downloading attachment from emails storing them into my local folder and after uploading to S3 bucket, I’m extracting text from the attachment.

For other emails it’s working fine, for some emails it’s giving me this error "variable not defined". It’s not working on for those emails which have 2 attachments, but that shouldn’t be related since I’m only extracting data depending on the suffix of the file.

What am I doing wrong here?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

query = Q(subject = 'Summary of Unmatched Responses')
recent_emails = account.inbox.filter(~query, datetime_received__range=(
    ews_bfr,
    ews_bfr1
))[:1]
       

Raw_data = []

for item in recent_emails:
    for attachment in item.attachments:
        if isinstance(attachment, FileAttachment):
            file_name = str(attachment.name).replace(' ', '_')
            fpath = os.path.join(r'E:/email_download', file_name)
            with open(fpath, 'wb') as f:
                f.write(attachment.content)
            #print('Saved attachment to', fpath)
                s3.meta.client.upload_file(r'E:/email_download/' + file_name, 'aaa-test-1234', 'email_attachment/{}'.format(file_name))

                if file_name.endswith('.pdf'):
                    filedata1 = extract_text('E:/email_download/' + file_name)
                    filedata2 = filedata1.replace("\n", '')
                    filedata = filedata2.replace(" ", '')
                    print(filedata)
                elif file_name.endswith('.docx'):
                    filedata1 = textract.process('E:/email_download/' + file_name)
                    filedata2 = str(filedata1).replace("\n", '')
                    filedata = filedata2.replace(" ", '')
                    print(filedata)

        elif isinstance(attachment, ItemAttachment):
            print(attachment.item.subject, attachment.item.body)

        Raw_data.append(filedata)

        print(Raw_data)

Error:

Traceback (most recent call last):
  File "E:\pythonProject\main.py", line 107, in <module>
    Raw_data.append(filedata)
NameError: name 'filedata' is not defined

>Solution :

You are using the variable filedata inside the loop for attachment in item.attachments: unconditionally, but it will only be defined when isinstance(attachment, FileAttachment) was true, and also either file_name.endswith('.pdf') or file_name.endswith('.docx') was true.

If those conditions are not true in the first iteration of the loop, the variable is not defined and you get this error.

Worse, if it had been defined in one iteration of the loop, but the conditions are not true in a subsequent iteration, the variable still refers to the value from the previous iteration and you match an incorrect attachment to an email it doesn’t belong to.

You need to make sure that filedata either is defined in any case, or you must only use it if it is defined.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading