Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to remove text between two delimiters in Python

I am trying to remove all text between the [] brackets after the phrase ‘"segmentation":’ Please see below snippet from file for context.

 "annotations": [
        {
            "id": 1,
            "image_id": 1,
            "segmentation": [
                [
                    621.63,
                    1085.67,
                    621.63,
                    1344.71,
                    841.66,
                    1344.71,
                    841.66,
                    1085.67
                ]
            ],
            "iscrowd": 0,
            "bbox": [
                621.63,
                1085.67,
                220.02999999999997,
                259.03999999999996
            ],
            "area": 56996,
            "category_id": 1124044
        },
        {
            "id": 2,
            "image_id": 1,
            "segmentation": [
                [
                    887.62,
                    1355.7,
                    887.62,
                    1615.54,
                    1114.64,
                    1615.54,
                    1114.64,
                    1355.7
                ]
            ],
            "iscrowd": 0,
            "bbox": [
                887.62,
                1355.7,
                227.0200000000001,
                259.8399999999999
            ],
            "area": 58988,
            "category_id": 1124044
        },
        {
            "id": 3,
            "image_id": 1,
            "segmentation": [
                [
                    1157.61,
                    1411.84,
                    1157.61,
                    1661.63,
                    1404.89,
                    1661.63,
                    1404.89,
                    1411.84
                ]
            ],
            "iscrowd": 0,
            "bbox": [
                1157.61,
                1411.84,
                247.2800000000002,
                249.7900000000002
            ],
            "area": 61768,
            "category_id": 1124044
        },
        ........... and so on.....

I ultimately just want to delete all text between the square brackets after the word segmentation appears. In other words, the output to look like (for the first instance):

"annotations": [
            {
                "id": 1,
                "image_id": 1,
                "segmentation": [],
                "iscrowd": 0,
                "bbox": [
                    621.63,
                    1085.67,
                    220.02999999999997,
                    259.03999999999996
                ],
                "area": 56996,
                "category_id": 1124044
            },

I’ve tried using the below code, but not quite having the luck currently. Is there something I am getting wrong due to the new lines?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import re
f = open('samplfile.json')
text = f.read()
f.close()

clean = re.sub('"segmentation":(.*)\]', '', text)

print(clean)

f = open('cleanedfile.json', 'w')
f.write(clean)
f.close()

I appreciate that the exact positioning I have for the [s in the clean line may not be quite right, but this code isn’t removing anything at the moment. Any help would be much appreciated!

Many thanks!

>Solution :

Python has a built in json module for parsing and modifying JSON. A regular expression is likely to be fragile and more headache than it’s probably worth.

You can do the following:

import json

with open('samplfile.json') as input_file, open('output.json', 'w') as output_file:
    data = json.load(input_file)
    for i in range(len(data['annotations'])):
        data['annotations'][i]['segmentation'] = []

    json.dump(data, output_file, indent=4)

Then, output.json contains:

{
    "annotations": [
        {
            "id": 1,
            "image_id": 1,
            "segmentation": [],
            "iscrowd": 0,
            "bbox": [
                621.63,
                1085.67,
                220.02999999999997,
                259.03999999999996
            ],
            "area": 56996,
            "category_id": 1124044
        },
        {
            "id": 2,
            "image_id": 1,
            "segmentation": [],
            "iscrowd": 0,
            "bbox": [
                887.62,
                1355.7,
                227.0200000000001,
                259.8399999999999
            ],
            "area": 58988,
            "category_id": 1124044
        },
        {
            "id": 3,
            "image_id": 1,
            "segmentation": [],
            "iscrowd": 0,
            "bbox": [
                1157.61,
                1411.84,
                247.2800000000002,
                249.7900000000002
            ],
            "area": 61768,
            "category_id": 1124044
        }
    ]
}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading