Parse JavaScript array with empty elements using bs4

I am trying to parse this javascript element using BS4.
I want to get that input array into a usable format.

<script type="text/javascript">
    
        require.config.params['matchheader'] = {
            input: [162,13,'Crystal Palace','Arsenal','05/08/2022 20:00:00','05/08/2022 00:00:00',6,'FT','0 : 1','0 : 2',,,'0 : 2','England','England']
    ,
            matchId: 1640674
        };


</script>

To get the text inside the global variable, I used the following regex:

re.search("input: \[.*?\]", script_element.string).group(0)

which returns:

"input: [162,13,'Crystal Palace','Arsenal','05/08/2022 20:00:00','05/08/2022 00:00:00',6,'FT','0 : 1','0 : 2',,,'0 : 2','England','England']"

I am having some trouble parsing this array because of the empty elements (literal_eval does not work).

Any idea on how to accomplish this? Is there an easier way to do it?

Regards

>Solution :

One solution could be insert None between the empty , and then parse it:

import re
from ast import literal_eval

data = re.search(r"input:\s*(.*)", s).group(1)  # <-- `s` is your string from the question
data = re.sub(r"(?<=,)\s*(?=,)", "None", data)
data = literal_eval(data)

print(data)

Prints:

[162, 13, 'Crystal Palace', 'Arsenal', '05/08/2022 20:00:00', '05/08/2022 00:00:00', 6, 'FT', '0 : 1', '0 : 2', None, None, '0 : 2', 'England', 'England']

Leave a Reply