Create a json with the right content in python

first i’m a newbie in python, but i have already developed in other programming languages (C++, PHP and Java mainly). I have trouble making python do what i want: create a correct json string. My problem is not creating a json string in itself but the content.
Let me explain i have this code:

import spacy
def eng_pos(textstr):    
    x = english() #A class which i developed to get the infinitive form of each verb
    data = {}
    nlp = spacy.load("en_core_web_trf")
    doc = nlp(textstr)
    for token in doc:
        data[token.text] = token.pos_
        if token.pos_ == "VERB":
            data['Infinitive'] = x.infinitive(token.text)
        print(token.text+" "+token.pos_)
    json_data = json.dumps(data)
    return json_data

which essentially creates a json data from a dictionary containing Part-of-speech(pos) of each word and for each verb it gives me the tense and the infinitive form. It also print each token and its pos. Once it’s done it dumps everything into a json string and prints it on the screen and then return it. So far no problem as it gives me a valid json but without the right content:

For info i used this sentences as textstr as exemple:

"IMAGINE, IF YOU will, a toy boat that might fit in the palm of your hand. At mid-ship add a squat spool of sewing thread lying on its side. Scale that up about a thousand-fold and the result is the 150-metre-long Nexans Aurora. The thread in question is kilometres of high-voltage power line ready to be deployed from the aft of the ship across the sea floor."

which gives me this json:

{"IMAGINE": "VERB", "Tense": ["Past"], "Infinitive": "deploy", ",": "PUNCT", "IF": "SCONJ", "YOU": "PRON", "will": "AUX", "a": "DET", "toy": "NOUN", "boat": "NOUN", "that": "SCONJ", "might": "AUX", "fit": "VERB", "in": "ADP", "the": "DET", "palm": "NOUN", "of": "ADP", "your": "PRON", "hand": "NOUN", ".": "PUNCT", "At": "ADP", "mid": "NOUN", "-": "PUNCT", "ship": "NOUN", "add": "VERB", "squat": "ADJ", "spool": "NOUN", "sewing": "NOUN", "thread": "NOUN", "lying": "VERB", "on": "ADP", "its": "PRON", "side": "NOUN", "Scale": "VERB", "up": "ADP", "about": "ADP", "thousand": "ADV", "fold": "ADV", "and": "CCONJ", "result": "NOUN", "is": "AUX", "150": "NUM", "metre": "NOUN", "long": "ADJ", "Nexans": "PROPN", "Aurora": "PROPN", "The": "DET", "question": "NOUN", "kilometres": "NOUN", "high": "ADJ", "voltage": "NOUN", "power": "NOUN", "line": "NOUN", "ready": "ADJ", "to": "PART", "be": "AUX", "deployed": "VERB", "from": "ADP", "aft": "NOUN", "across": "ADP", "sea": "NOUN", "floor": "NOUN"}

if you watch the json string closely you will notice that only the tense and the infinitive form is given only for the last verb "deployed" in the last sentence of the paragraph. (and not for every verb in this short paragraph as i want it that way). Why? That’s my question. Why is only the last verb taken into account and the other verbs ignored? I think this has something to do with my python code as everything else is correct. I’m stucked since two days and i cannot see where the problem lie, so if you can help me please.

>Solution :

That is because you are writing to the dictionary using the key Tense and Infinitive, and every time you do so, the data is overriden.

You most likely want to store a nested dict, which has not only the pos_ but the Tens and Infinitive as well:

data[token.text] = {"pos": token.pos_}
if token.pos_ == "VERB":
    data[token.text]['Infinitive'] = x.infinitive(token.text)

This will produce something like this:

    "deployed": {
        "pos_": "VERB",
        "Tense": ["PAST"],
        "Infinitive": "deploy"
    "floor": {
        "pos_": "NOUN"

Keep in mind however, that this will still override the data for duplicate words. However, as the result should always be the same for the same word, this is probably fine.

Leave a Reply