How to get file type from complex image URL in python?

December 10, 2023

I want to get image file extensions from image URLs like below:

from os.path import splitext
image = ['ai','bmp','gif','ico','jpeg','jpg','png','ps','psd','svg','tif','tiff','webp']
def splitext_(path, extensions):
    for ext in extensions:
        if path.endswith(ext):
            return path[:-len(ext)], path[-len(ext):]
    return splitext(path)

val = "https://dkstatics-public.digikala.com/digikala-products/9f4cb4e049e7a5d48c7bc22257b5031ee9a5eae8_1602179467.jpg?x-oss-process=image/resize,m_lfit,h_300,w_300/quality,q_80"
#val = "https://www.needmode.com/wp-content/uploads/2023/04/%D9%84%D9%88%D8%A7%D8%B2%D9%85-%D8%AA%D8%AD%D8%B1%DB%8C%D8%B1.webp"
ex_filename, ext = splitext_(val,image)
ex_extension = ext.replace(".", "", 1)
im_extension = ex_extension.lower()

print(im_extension)

The problem is this method not working on URLs like below

https://dkstatics-public.digikala.com/digikala-products/9f4cb4e049e7a5d48c7bc22257b5031ee9a5eae8_1602179467.jpg?x-oss-process=image/resize,m_lfit,h_300,w_300/quality,q_80

The result is nothing for the example image URL, but it’s working on normal URLs.

>Solution :

Edit: here how to manage multiple extension.

For this, it’s better to use the @Andrej Kesely answer for parsing the url. Working on the url as string only will lead to have the host split and it’s harder to manage (you would go to rewrite urlparse).

from urllib.parse import urlparse

val = "https://dkstatics-public.digikala.com/digikala-products/9f4cb4e049e7a5d48c7bc22257b5031ee9a5eae8_1602179467.tar.gz?x-oss-process=image/resize,m_lfit,h_300,w_300/quality,q_80"

parsed_url = urlparse(val)

extension = parsed_url.path.rsplit(".")[1:]
print(extension)

First answer.
Here how you can do:

val = "https://dkstatics-public.digikala.com/digikala-products/9f4cb4e049e7a5d48c7bc22257b5031ee9a5eae8_1602179467.jpg?x-oss-process=image/resize,m_lfit,h_300,w_300/quality,q_80"
print(val.split("?")[0].split(".")[-1])

You split first on the question mark and keeping the url part, not the parameters. And then you split on the dot and keep the last part which is the extension.

It won’t work with multiple extensions like tar.gz, you would only have gz.