Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Unable to parse JSON string obtained from attribute in a HTML tag in Python

I am making an AJAX call to an endpoint (I didn’t create this API) where the response is in JSON form. Within the JSON there is an a key called content of type string. This content appears to me to be HTML data which contains some JSON inside. I want to be able to parse this JSON which is contained within the HTML data, but I keep getting the following error when I attempt to do a json.loads() of the string:

{JSONDecodeError}JSONDecodeError('Expecting property name enclosed in double quotes: line 1 column 2 (char 1)')

and I don’t really understand why I am getting this error

Here is the JSON string I am trying to parse:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

{\"name\":\"ThreadMainListItemNormalizer\",\"props\":{\"thread\":{\"threadId\":4369992,\"threadTypeId\":1,\"titleSlug\":\"sebamed-sale-extra-soft-baby-cream-ps239-anti-dandruff-shampoo-ps387\",\"title\":\"Sebamed sale - extra soft baby cream \£2.39 / anti dandruff shampoo \£3.87\",\"currentUserVoteDirection\":\"\",\"commentCount\":0,\"status\":\"Activated\",\"isExpired\":false,\"isNew\":true,\"isPinned\":false,\"isTrending\":null,\"isBookmarked\":false,\"isLocal\":false,\"temperature\":0,\"temperatureLevel\":\"\",\"type\":\"Deal\",\"nsfw\":false,\"deletedAt\":null,\"publishedAt\":1720003748,\"voucherCode\":\"\",\"link\":\"https://www.justmylook.com/sebamed-m583\",\"merchant\":{\"merchantId\":45518,\"merchantName\":\"Justmylook\",\"merchantUrlName\":\"justmylook.co.uk\",\"isMerchantPageEnabled\":true},\"price\":2.39,\"nextBestPrice\":0,\"percentage\":0,\"discountType\":null,\"shipping\":{\"isFree\":1,\"price\":0},\"user\":{\"userId\":2701300,\"username\":\"Manish_N\",\"title\":\"\",\"avatar\":{\"path\":\"users/raw/default\",\"name\":\"2701300_6\",\"slotId\":\"default\",\"width\":0,\"height\":0,\"version\":6,\"unattached\":false,\"uid\":\"2701300_6.raw\",\"ext\":\"raw\"},\"persona\":{\"text\":null,\"type\":null},\"isBanned\":false,\"isDeletedOrPendingDeletion\":false,\"isUserProfileHidden\":false}}}}

If I paste the above JSON string at this online JSON validator tool it says that it is invalid JSON, however, when I unescape the JSON using this tool I get the following output:

"name":"ThreadMainListItemNormalizer","props":{"thread":{"threadId":4369991,"threadTypeId":1,"titleSlug":"samsung-55-qn700c-neo-qled-8k-hdr-smart-tv","title":"Samsung 55\" QN700C Neo QLED 8K HDR Smart TV Sold by Reliant Direct FBA","currentUserVoteDirection":"","commentCount":0,"status":"Activated","isExpired":false,"isNew":true,"isPinned":false,"isTrending":null,"isBookmarked":false,"isLocal":false,"temperature":0.59,"temperatureLevel":"Hot1","type":"Deal","nsfw":false,"deletedAt":null,"publishedAt":1720003637,"voucherCode":"","link":"https://www.amazon.co.uk/dp/B0BWFNLPTP?smid=A2CN43WDI0AWCL","merchant":{"merchantId":1650,"merchantName":"Amazon","merchantUrlName":"amazon-uk","isMerchantPageEnabled":true},"price":999,"nextBestPrice":1198,"percentage":0,"discountType":null,"shipping":{"isFree":1,"price":0},"user":{"userId":2679277,"username":"ben.jammin","title":"","avatar":{"path":"users/raw/default","name":"2679277_1","slotId":"default","width":0,"height":0,"version":1,"unattached":false,"uid":"2679277_1.raw","ext":"raw"},"persona":{"text":null,"type":null},"isBanned":false,"isDeletedOrPendingDeletion":false,"isUserProfileHidden":false}}}}

which is in fact valid JSON. My issue then arises, when I try to replicate the unescape tool and try do unescape the string within Python.

I have tried the following solutions

  • Using ast.literal_eval() but I get the following error

    {SyntaxError}SyntaxError('unexpected character after line continuation character', ('<unknown>', 1, 3, '{\\"name\\":\\"ThreadMainListItemNo...:null,\\"type\\":null},\\"isBanned\\":false,\\"isDeletedOrPendingDeletion\\":false,\\"isUserProfileHidden\\":false}}}}', 1, 0))
    
  • Using .encode('raw_unicode_escape').decode('unicode_escape') method outlined here but after doing a json.loads() of the unescaped string I get the following error

    {JSONDecodeError}JSONDecodeError('Invalid \\escape: line 1 column 224 (char 223)')
    

UPDATE:

I think the issue is that I have some invalid escape characters in the string e.g. . I followed the solution here and it’s resolved my issue.

Does anyone have any idea why this API might be including an escaped £ symbol?

>Solution :

Here is one way to handle it:

import json

string = "{\"name\":\"ThreadMainListItemNormalizer\",\"props\":{\"thread\":{\"threadId\":4369992,\"threadTypeId\":1,\"titleSlug\":\"sebamed-sale-extra-soft-baby-cream-ps239-anti-dandruff-shampoo-ps387\",\"title\":\"Sebamed sale - extra soft baby cream \£2.39 / anti dandruff shampoo \£3.87\",\"currentUserVoteDirection\":\"\",\"commentCount\":0,\"status\":\"Activated\",\"isExpired\":false,\"isNew\":true,\"isPinned\":false,\"isTrending\":null,\"isBookmarked\":false,\"isLocal\":false,\"temperature\":0,\"temperatureLevel\":\"\",\"type\":\"Deal\",\"nsfw\":false,\"deletedAt\":null,\"publishedAt\":1720003748,\"voucherCode\":\"\",\"link\":\"https://www.justmylook.com/sebamed-m583\",\"merchant\":{\"merchantId\":45518,\"merchantName\":\"Justmylook\",\"merchantUrlName\":\"justmylook.co.uk\",\"isMerchantPageEnabled\":true},\"price\":2.39,\"nextBestPrice\":0,\"percentage\":0,\"discountType\":null,\"shipping\":{\"isFree\":1,\"price\":0},\"user\":{\"userId\":2701300,\"username\":\"Manish_N\",\"title\":\"\",\"avatar\":{\"path\":\"users/raw/default\",\"name\":\"2701300_6\",\"slotId\":\"default\",\"width\":0,\"height\":0,\"version\":6,\"unattached\":false,\"uid\":\"2701300_6.raw\",\"ext\":\"raw\"},\"persona\":{\"text\":null,\"type\":null},\"isBanned\":false,\"isDeletedOrPendingDeletion\":false,\"isUserProfileHidden\":false}}}}"
string = string.replace('false', '"False"').replace('true', '"True"').replace('null', '"None"').replace('\\', '')
result = json.loads(string)
print(result)

Result in terminal:

{'name': 'ThreadMainListItemNormalizer',
 'props': {'thread': {'threadId': 4369992,
   'threadTypeId': 1,
   'titleSlug': 'sebamed-sale-extra-soft-baby-cream-ps239-anti-dandruff-shampoo-ps387',
   'title': 'Sebamed sale - extra soft baby cream £2.39 / anti dandruff shampoo £3.87',
   'currentUserVoteDirection': '',
   'commentCount': 0,
   'status': 'Activated',
   'isExpired': 'False',
   'isNew': 'True',
   'isPinned': 'False',
   'isTrending': 'None',
   'isBookmarked': 'False',
   'isLocal': 'False',
   'temperature': 0,
   'temperatureLevel': '',
   'type': 'Deal',
   'nsfw': 'False',
   'deletedAt': 'None',
   'publishedAt': 1720003748,
   'voucherCode': '',
   'link': 'https://www.justmylook.com/sebamed-m583',
   'merchant': {'merchantId': 45518,
    'merchantName': 'Justmylook',
    'merchantUrlName': 'justmylook.co.uk',
    'isMerchantPageEnabled': 'True'},
   'price': 2.39,
   'nextBestPrice': 0,
   'percentage': 0,
   'discountType': 'None',
   'shipping': {'isFree': 1, 'price': 0},
   'user': {'userId': 2701300,
    'username': 'Manish_N',
    'title': '',
    'avatar': {'path': 'users/raw/default',
     'name': '2701300_6',
     'slotId': 'default',
     'width': 0,
     'height': 0,
     'version': 6,
     'unattached': 'False',
     'uid': '2701300_6.raw',
     'ext': 'raw'},
    'persona': {'text': 'None', 'type': 'None'},
    'isBanned': 'False',
    'isDeletedOrPendingDeletion': 'False',
    'isUserProfileHidden': 'False'}}}}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading