Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove duplicates from List of dynamic objects

Goal: remove duplicates from the same deepest sub-list. Keep others.

List contains multiple: dict -> dict -> list

However, a different sub-list may contain the exact same sentence as a different sub-list. These need to be kept.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

set() seems ideal, but I want this applied on the deepest sub-lists. Not on the my_list object. This structure may change and have deeper dicts and lists in different runs.


Code:

I’ve had many variations of this, but in reality my_list can have any structure.

Is what I want possible, if the structure may be different?

my_list =  # ...

for ele in my_list:
    if isinstance(ele, list):
      ele = list(set(ele))
    elif: isinstance(ele, dict):
      

my_list:

e.g. 1st PDF -> ECON -> awards and 1st PDF -> ECON -> security contain the same duplicates.

[
    {
        "../data/gri/reports/GPIC_Sustainability_Report_2020__-_40_Years_of_Sustainable_Success.pdf": {
            "COMP": {
                "Behaviour": [
                    "we focus apply measures four elements safety culture systems processes skills knowledge individuals behaviours attitudes perception leadership"
                ]
            },
            "ECON": {
                "subsidies": [
                    "meanwhile main recent regulatory impact business significant phasing subsidies gas electricity prices expected continue next years well nationwide strategy allocates natural gas conservatively"
                ],
                "awards": [
                    "ensure robust security 100 readiness times participate international awards rospa bsc awards",
                    "ensure robust security 100 readiness times participate international awards rospa bsc awards"
                ],
                "security": [
                    "ensure robust security 100 readiness times participate international awards rospa bsc awards",
                    "ensure robust security 100 readiness times participate international awards rospa bsc awards"
                ]
            }
        }
    },
    {
        "../data/gri/reports/GRI_2018_Report.pdf": {
            "COMP": {
...

Desired List:

[
    {
        "../data/gri/reports/GPIC_Sustainability_Report_2020__-_40_Years_of_Sustainable_Success.pdf": {
            "COMP": {
                "Behaviour": [
                    "we focus apply measures four elements safety culture systems processes skills knowledge individuals behaviours attitudes perception leadership"
                ]
            },
            "ECON": {
                "subsidies": [
                    "meanwhile main recent regulatory impact business significant phasing subsidies gas electricity prices expected continue next years well nationwide strategy allocates natural gas conservatively"
                ],
                "awards": [
                    "ensure robust security 100 readiness times participate international awards rospa bsc awards"
                ],
                "security": [
                    "ensure robust security 100 readiness times participate international awards rospa bsc awards"
                ]
            }
        }
    },
    {
        "../data/gri/reports/GRI_2018_Report.pdf": {
            "COMP": {
...

Please let me know if I should clarify anything else.

>Solution :

So it sounds like the only duplicates you care about are when you have a list of strings, so we can make some assumptions:

  • It’s only JSON (lists, dicts, strings and primitives)
  • If we fail to hash an object, then it can’t be a duplicate
  • Order of deduped lists doesn’t matter

So let’s use recursion

def dedup(obj):
    if insinstance(obj, list):
        try:
            # We try to dedupe as if everything is hashable,
            # but this will fail for a list of dicts, so fallback
            # in that case.
            return list({dedup(x) for x in obj})
        except TypeError:
            return [dedup(x) for x in obj]
    elif isinstance(obj, dict):
        return {k: dedup(v) for k, v in obj.items()}
    else:
        # this is some kind of primitive (str/int/float/bool/None)
        return obj
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading