Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

associate a list of values to each set element

I’m trying to come up with the best solution for the following problem:

I have a list of filenames, and associated with each filename is an ID; these IDs are non-unique, meaning that several filenames might be associated with one ID.

So I could pack my data up as: (ID, [filename1, filename2,…])

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The problem is that I would like to work with the IDs as a set since I will need to group and extract differences and intersections with another predefined grouping of these IDs, and I need the operations to be relatively fast since I have about a million IDs.

But I know no way to keep ID associated with the list of filenames while treating ID as an element in a set. Is this possible to do with sets, or is there any set extension that enables this?

>Solution :

It sounds like your data looks something like the sample data below. If so, then the code shows how to use a hash table to do what you’re asking. The hash table could either be a Python dict (hashed on id as key with a list of file names as associated value) or simply a set of id elements if that’s what you really want (though as others have suggested in the comments, a dict is potentially the best solution).

files = [
    {'filename':'foo101', 'id':1},
    {'filename':'foo102', 'id':1},
    {'filename':'foo103', 'id':1},
    {'filename':'foo201', 'id':2},
    {'filename':'foo202', 'id':2},
    {'filename':'foo301', 'id':3},
    {'filename':'foo401', 'id':4},
]
fileDict = defaultdict(list)
for d in files:
    fileDict[d['id']].append(d['filename'])
[print(id, fileNames) for id, fileNames in fileDict.items()]
idSet = set(fileDict)
print(idSet)

Sample output:

1 ['foo101', 'foo102', 'foo103']
2 ['foo201', 'foo202']
3 ['foo301']
4 ['foo401']
{1, 2, 3, 4}

The above code uses a defaultdict(list) for convenience, but you could also use a regular dict as follows:

files = [
    {'filename':'foo101', 'id':1},
    {'filename':'foo102', 'id':1},
    {'filename':'foo103', 'id':1},
    {'filename':'foo201', 'id':2},
    {'filename':'foo202', 'id':2},
    {'filename':'foo301', 'id':3},
    {'filename':'foo401', 'id':4},
]
fileDict = {}
for d in files:
    if d['id'] not in fileDict:
        fileDict[d['id']] = []
    fileDict[d['id']].append(d['filename'])
[print(id, fileNames) for id, fileNames in fileDict.items()]
idSet = set(fileDict)
print(idSet)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading