Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove duplicates from json array in a json file

I have a very large json file with thousands of rows that look like this (scraped):

[
{"result": ["/results/1138/dundalk-aw/2022-03-11/806744", "/results/1138/dundalk-aw/2022-03-11/806745", "/results/1138/dundalk-aw/2022-03-11/806746", "/results/1138/dundalk-aw/2022-03-11/806747", "/results/1138/dundalk-aw/2022-03-11/806748", "/results/1138/dundalk-aw/2022-03-11/806749", "/results/1138/dundalk-aw/2022-03-11/806750", "/results/1138/dundalk-aw/2022-03-11/806751", "/results/14/exeter/2022-03-11/804190", "/results/14/exeter/2022-03-11/804193", "/results/14/exeter/2022-03-11/804194", "/results/14/exeter/2022-03-11/804192", "/results/14/exeter/2022-03-11/804196", "/results/14/exeter/2022-03-11/804191", "/results/14/exeter/2022-03-11/804195", "/results/30/leicester/2022-03-11/804201", "/results/30/leicester/2022-03-11/804200", "/results/30/leicester/2022-03-11/804198", "/results/30/leicester/2022-03-11/804197", "/results/30/leicester/2022-03-11/804199", "/results/30/leicester/2022-03-11/804202", "/results/37/newcastle/2022-03-11/804181", "/results/37/newcastle/2022-03-11/804179", "/results/37/newcastle/2022-03-11/804182", "/results/37/newcastle/2022-03-11/804180", "/results/37/newcastle/2022-03-11/804177", "/results/37/newcastle/2022-03-11/804176", "/results/37/newcastle/2022-03-11/804178", "/results/513/wolverhampton-aw/2022-03-11/804352", "/results/513/wolverhampton-aw/2022-03-11/804353", "/results/513/wolverhampton-aw/2022-03-11/806925", "/results/513/wolverhampton-aw/2022-03-11/804350", "/results/513/wolverhampton-aw/2022-03-11/804354", "/results/513/wolverhampton-aw/2022-03-11/804349", "/results/513/wolverhampton-aw/2022-03-11/804351", "/results/1303/al-ain/2022-03-11/806926", "/results/1244/goulburn/2022-03-11/807045", "/results/869/sakhir/2022-03-11/806948", "/results/1244/goulburn/2022-03-11/807045", "/results/869/sakhir/2022-03-11/806948"]},
{"result": ["/results/8/carlisle/2022-03-10/804174", "/results/8/carlisle/2022-03-10/804172", "/results/8/carlisle/2022-03-10/804170", "/results/8/carlisle/2022-03-10/804175", "/results/8/carlisle/2022-03-10/804171", "/results/8/carlisle/2022-03-10/804173", "/results/8/carlisle/2022-03-10/805620", "/results/1353/newcastle-aw/2022-03-10/804340", "/results/1353/newcastle-aw/2022-03-10/804341", "/results/1353/newcastle-aw/2022-03-10/804338", "/results/1353/newcastle-aw/2022-03-10/804342", "/results/1353/newcastle-aw/2022-03-10/804337", "/results/1353/newcastle-aw/2022-03-10/804339", "/results/394/southwell-aw/2022-03-10/804346", "/results/394/southwell-aw/2022-03-10/804344", "/results/394/southwell-aw/2022-03-10/804345", "/results/394/southwell-aw/2022-03-10/804348", "/results/394/southwell-aw/2022-03-10/806779", "/results/394/southwell-aw/2022-03-10/804343", "/results/394/southwell-aw/2022-03-10/804347", "/results/394/southwell-aw/2022-03-10/806778", "/results/198/thurles/2022-03-10/806623", "/results/198/thurles/2022-03-10/806624", "/results/198/thurles/2022-03-10/806625", "/results/198/thurles/2022-03-10/806626", "/results/198/thurles/2022-03-10/806627", "/results/198/thurles/2022-03-10/806628", "/results/198/thurles/2022-03-10/806629", "/results/90/wincanton/2022-03-10/804183", "/results/90/wincanton/2022-03-10/804186", "/results/90/wincanton/2022-03-10/804188", "/results/90/wincanton/2022-03-10/804185", "/results/90/wincanton/2022-03-10/804187", "/results/90/wincanton/2022-03-10/804184", "/results/90/wincanton/2022-03-10/804189", "/results/219/saint-cloud/2022-03-10/807032", "/results/219/saint-cloud/2022-03-10/806812", "/results/219/saint-cloud/2022-03-10/806837", "/results/219/saint-cloud/2022-03-10/807033", "/results/219/saint-cloud/2022-03-10/807037", "/results/219/saint-cloud/2022-03-10/807041", "/results/219/saint-cloud/2022-03-10/807042", "/results/219/saint-cloud/2022-03-10/807043", "/results/219/saint-cloud/2022-03-10/807044", "/results/219/saint-cloud/2022-03-10/806837", "/results/219/saint-cloud/2022-03-10/807033"]}
]

Now, inside the "result" arrays there are some duplicates. In this case for example /results/1244/goulburn/2022-03-11/807045

How could I filter these duplicates out?
I found some solutions here on Stackoverflow to check for duplicate "result" arrays, but not for checking if anything inside the array is a duplicate. At least nothing I tried worked, but I guess I am messing something up.
Tried for two days, but could not make out this one on my own – or I am too stupid to find the answer in the smiliar questions here on stackoverflow – and I have very very limited Java knowledge.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I looked into converting the json into a list and then filtering out the duplicates, but that seemed way to clunky for a large file?

>Solution :

Once you have all the JSON data loaded you can map the results, remove the duplicated ones with set and convert back to list to preserve the original structure:

data = [{...}]  # large JSON data list
data = list(map(lambda x: {'result': list(set(x['result']))}, data))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading