So I have this text that I extracted out of a <script> tag.
function fbq_w123456as() {
fbq('track', 'AddToCart', {
contents: [
{
'id': '123456',
'quantity': '',
'item_price':69.99 }
],
content_name: 'Stackoverflow',
content_category: '',
content_ids: ['w123456as'],
content_type: 'product',
value: 420.69,
currency: 'USD'
});
}
I’m trying to extract this information using regex and later converting it into JSON using python.
I’ve tried re.search(r"'AddToCart', (.*?);" and a few other attempts but no luck. I am very new to regex and I am struggling with it.
{
"contents":[
{
"id":"123456",
"quantity":"",
"item_price":69.99
}
],
"content_name":"Stackoverflow",
"content_category":"",
"content_ids":[
"w123456as"
],
"content_type":"product",
"value":420.69,
"currency":"USD"
}
How would I create the regex to extract the JSON data?
>Solution :
You can try:
import re
from ast import literal_eval
js_txt = """\
function fbq_w123456as() {
fbq('track', 'AddToCart', {
contents: [
{
'id': '123456',
'quantity': '',
'item_price':69.99 }
],
content_name: 'Stackoverflow',
content_category: '',
content_ids: ['w123456as'],
content_type: 'product',
value: 420.69,
currency: 'USD'
});
}"""
out = re.search(r"'AddToCart', (\{.*?\})\);", js_txt, flags=re.S).group(1)
out = re.sub(r"""([^"'\s]+):""", r'"\1":', out)
out = literal_eval(out)
print(out)
Prints python dict:
{
"contents": [{"id": "123456", "quantity": "", "item_price": 69.99}],
"content_name": "Stackoverflow",
"content_category": "",
"content_ids": ["w123456as"],
"content_type": "product",
"value": 420.69,
"currency": "USD",
}