Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

python remove duplicates from a list of list with uneven distribution

i have a python list of lists i want to merge all the containing list with at least 1 common element and remove the similar items

i have a big set of data which is a list of lists, with some common data in some of the containing lists, i want to merge all lists with common data

# sample data
foo = [
[0,1,2,6,9],
[0,1,2,6,5],
[3,4,7,3,2],
[12,36,28,73],
[537],
[78,90,34,72,0],
[573,73],
[99],
[41,44,79],
]

# i want to get this
[
[0,1,2,6,9,5,3,4,7,3,2,78,90,34,72,0],
[12,36,28,73,573,73,573],
[99],
[41,44,79],
]

the elements containing even one common element they are grouped together

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

the original data file is this

Edit

this is what i am trying

import json

data = json.load(open('x.json')) # https://files.catbox.moe/y1yt5w.json


class Relations:
    def __init__(self):
        pass

    def process_relation(self, flat_data):
        relation_keys = []
        rel = {}
        for i in range(len(flat_data)):
            rel[i] = []
            for n in flat_data:
                if i in n:
                    rel[i].extend(n)
        return {k:list(set(v)) for k,v in rel.items()}

    def process(self, flat_data):
        rawRelations = self.process_relation(flat_data)
        return rawRelations

rel = Relations()
print(json.dumps(rel.process(data), indent=4), file=open('out.json', 'w')) # https://files.catbox.moe/n65tie.json

NOTE – the largest number present in the data will be equal to the length of list of lists

>Solution :

A simple (and probably non-optimal) algorithm that modifies the input data in place:

target_idx = 0

while target_idx < len(data):
    src_idx = target_idx + 1
    did_merge = False
    while src_idx < len(data):
        if set(data[target_idx]) & set(data[src_idx]):
            data[target_idx].extend(data[src_idx])
            data.pop(src_idx)  # this was merged
            did_merge = True
            continue  # with same src_idx
        src_idx += 1
    if not did_merge:
        target_idx += 1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading