Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Parsing string to dictionary

I’m working on a communications project with a radio that transmits a formatted string message, similar to:

message_string = 'Transmission\n variables \n  0.01 First variable\n  0.02 Second variable\n  0.03 Third variable \n More variables\n  0.03 Next variable\n  0.04 Another variable'

When printed, this looks like

print(message_string)
Transmission
 variables
  0.01 First variable
  0.02 Second variable
  0.03 Third variable
 More variables
  0.03 Next variable
  0.04 Another variable

This looks nice to humans, but is tricky for the computer – especially since I am trying to convert this to a python dictionary. In my actual system there are quite a few of these variables, and the code needs to systematically process all of them into a dictionary.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I think it might include something like

message_string = message_string.replace('\n','{')

but deciding which direction of brackets to use in different cases, and where to put the colons for the dictionary, is confusing me. I want an output similar to

message_dict = {
    'variables': {
       'First variable': 0.01,
       'Second variable': 0.02,
        'Third variable': 0.03},
    'More variables': {
       'Next variable': 0.03,
       'Another variable': 0.04,
    } 
}

where an error would not be thrown if one of the variables was missing from the transmission (since that sometimes happens).How do I convert this string into a dictionary?

>Solution :

Assuming that the indents increase with one space at a time, you could use this stack-based solution:

def to_dict(s):
    result = {}
    stack = [result]
    for line in s.splitlines():
        stripped = line.lstrip()
        indent = len(line) - len(stripped) + 1
        if indent >= len(stack):
            stack.append(None)
        if stripped[0].isdigit():
            value, key = stripped.split(" ", 1)
            stack[indent-1][key] = float(value)
        else:
            stack[indent-1][stripped] = stack[indent] = {}
    
    return result

Call it like this:

message_string = 'Transmission\n variables \n  0.01 First variable\n  0.02 Second variable\n  0.03 Third variable \n More variables\n  0.03 Next variable\n  0.04 Another variable'
d = to_dict(message_string)

For this example d will be:

{
    'Transmission': {
        'variables ': {
            'First variable': 0.01, 
            'Second variable': 0.02, 
            'Third variable ': 0.03
        }, 
        'More variables': {
            'Next variable': 0.03, 
            'Another variable': 0.04
        }
    }
}

Compared to what you wrote, this has the extra level of Transmission, but as this really is part of the input, I kept it like that.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading