Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Creating a multidimensional list of similarly named files with different extensions

I have a directory of files that follows this file naming pattern:

alice_01.mov
alice_01.mp4
alice_02.mp4
bob_01.avi

My goal is to find all files at a given path and create a "multidimensional" list of them where each sublist is the unique name of the file (without extension) and then a list of extensions, like so:

resulting_list = [
    ['alice_01', ['mov','mp4']],
    ['alice_02', ['mp4']],
    ['bob_01', ['avi']]
]

I have gotten this far:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import os

path = "user_files/"

def user_files(path):
    files = []
    for file in os.listdir(path):
        files.append(file)
    return files

file_array = []
for file in user_files(path):
    file_name = file.split(".")[0]
    file_ext = file.split(".")[1]
    if file_name not in (sublist[0] for sublist in file_array):
        file_array.append([file_name,[file_ext]])
    else:
        file_array[file_array.index(file_name)].append([file_name,[file_ext]])

print(file_array)

My problem is in the else condition but I’m struggling to get it right.
Any help is appreciated.

>Solution :

Here’s how you can do it using a dict to store the results:

filenames = [
    "alice_01.mov",
    "alice_01.mp4",
    "alice_02.mp4",
    "bob_01.avi",
]

file_dict = {}

for file in filenames:
    file_name, file_ext = file.split(".")[0:2]
    file_dict.setdefault(file_name, []).append(file_ext)

print(file_dict)

Result:

{'alice_01': ['mov', 'mp4'], 'alice_02': ['mp4'], 'bob_01': ['avi']}

UPDATE: The code above doesn’t handle special cases, so here’s a slightly more robust version.

from pprint import pprint

filenames = [
    "alice_01.mov",
    "alice_01.mp4",
    "alice_02.mp4",
    "bob_01.avi",
    "john_007.json.xz",
    "john_007.json.txt.xz",
    "john_007.json.txt.zip",
    "tom_and_jerry",
    "tom_and_jerry.dat",
]

file_dict = {}

for file in filenames:
    parts = file.split(".")
    if len(parts) > 1:
        file_name = ".".join(parts[0:-1])
        file_ext = parts[-1]
    else:
        file_name = parts[0]
        file_ext = ""
    file_dict.setdefault(file_name, []).append(file_ext)

pprint(file_dict)

Result:

{'alice_01': ['mov', 'mp4'],
 'alice_02': ['mp4'],
 'bob_01': ['avi'],
 'john_007.json': ['xz'],
 'john_007.json.txt': ['xz', 'zip'],
 'tom_and_jerry': ['', 'dat']}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading