Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract vale from anomalous dictionary

I have a set of strings formatted as BBB below, and I need to extract the value corresponding to the text key (in the example below it’s "My life is amazing").

BBB = str({"id": "18976", "episode_done": False, "text": "My life is amazing", 
    "text_candidates": ["My life is amazing", "I am worried about global warming"], 
    "metrics": {"clen": AverageMetric(12), "ctrunc": AverageMetric(0), 
    "ctrunclen": AverageMetric(0)}})

I tried converting BBB into a string and then into a dictionary using json.load and ast.literal_eval, but I get error messages in both cases. I suppose this is due to the fact that the metrics key has a dictionary as a value.

How do you suggest to solve the issue? Thanks.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You could adapt the source of ast.literal_eval() to something that parses function calls (and other non-literals), but into strings:

import ast

BBB = """
{"id": "18976", "episode_done": False, "text": "My life is amazing", 
    "text_candidates": ["My life is amazing", "I am worried about global warming"], 
    "metrics": {"clen": AverageMetric(12), "ctrunc": AverageMetric(0), 
    "ctrunclen": AverageMetric(0)}}
""".strip()


def literal_eval_with_function_calls(source):
    # Adapted from `ast.literal_eval`
    def _convert(node):
        if isinstance(node, list):
            return [_convert(arg) for arg in node]
        if isinstance(node, ast.Constant):
            return node.value
        if isinstance(node, ast.Tuple):
            return tuple(map(_convert, node.elts))
        if isinstance(node, ast.List):
            return list(map(_convert, node.elts))
        if isinstance(node, ast.Set):
            return set(map(_convert, node.elts))
        if isinstance(node, ast.Call) and isinstance(node.func, ast.Name) and node.func.id == 'set' and node.args == node.keywords == []:
            return set()
        if isinstance(node, ast.Dict):
            return dict(zip(map(_convert, node.keys), map(_convert, node.values)))
        if isinstance(node, ast.Expression):
            return _convert(node.body)
        return {
            f'${node.__class__.__name__}': ast.get_source_segment(source, node),
        }

    return _convert(ast.parse(source, mode='eval'))


print(literal_eval_with_function_calls(BBB))

This outputs

{'episode_done': False,
 'id': '18976',
 'metrics': {'clen': {'$Call': 'AverageMetric(12)'},
             'ctrunc': {'$Call': 'AverageMetric(0)'},
             'ctrunclen': {'$Call': 'AverageMetric(0)'}},
 'text': 'My life is amazing',
 'text_candidates': ['My life is amazing', 'I am worried about global warming']}

However, it would be better to just have data that’s not in a non-parseable format to begin with…

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading