Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to get file type from complex image URL in python?

I want to get image file extensions from image URLs like below:

from os.path import splitext
image = ['ai','bmp','gif','ico','jpeg','jpg','png','ps','psd','svg','tif','tiff','webp']
def splitext_(path, extensions):
    for ext in extensions:
        if path.endswith(ext):
            return path[:-len(ext)], path[-len(ext):]
    return splitext(path)

val = "https://dkstatics-public.digikala.com/digikala-products/9f4cb4e049e7a5d48c7bc22257b5031ee9a5eae8_1602179467.jpg?x-oss-process=image/resize,m_lfit,h_300,w_300/quality,q_80"
#val = "https://www.needmode.com/wp-content/uploads/2023/04/%D9%84%D9%88%D8%A7%D8%B2%D9%85-%D8%AA%D8%AD%D8%B1%DB%8C%D8%B1.webp"
ex_filename, ext = splitext_(val,image)
ex_extension = ext.replace(".", "", 1)
im_extension = ex_extension.lower()

print(im_extension)

The problem is this method not working on URLs like below

https://dkstatics-public.digikala.com/digikala-products/9f4cb4e049e7a5d48c7bc22257b5031ee9a5eae8_1602179467.jpg?x-oss-process=image/resize,m_lfit,h_300,w_300/quality,q_80

The result is nothing for the example image URL, but it’s working on normal URLs.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Edit: here how to manage multiple extension.

For this, it’s better to use the @Andrej Kesely answer for parsing the url. Working on the url as string only will lead to have the host split and it’s harder to manage (you would go to rewrite urlparse).

from urllib.parse import urlparse

val = "https://dkstatics-public.digikala.com/digikala-products/9f4cb4e049e7a5d48c7bc22257b5031ee9a5eae8_1602179467.tar.gz?x-oss-process=image/resize,m_lfit,h_300,w_300/quality,q_80"

parsed_url = urlparse(val)

extension = parsed_url.path.rsplit(".")[1:]
print(extension)

First answer.
Here how you can do:

val = "https://dkstatics-public.digikala.com/digikala-products/9f4cb4e049e7a5d48c7bc22257b5031ee9a5eae8_1602179467.jpg?x-oss-process=image/resize,m_lfit,h_300,w_300/quality,q_80"
print(val.split("?")[0].split(".")[-1])

You split first on the question mark and keeping the url part, not the parameters. And then you split on the dot and keep the last part which is the extension.

It won’t work with multiple extensions like tar.gz, you would only have gz.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading