Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Parse JavaScript array with empty elements using bs4

I am trying to parse this javascript element using BS4.
I want to get that input array into a usable format.

<script type="text/javascript">
    
        require.config.params['matchheader'] = {
            input: [162,13,'Crystal Palace','Arsenal','05/08/2022 20:00:00','05/08/2022 00:00:00',6,'FT','0 : 1','0 : 2',,,'0 : 2','England','England']
    ,
            matchId: 1640674
        };


</script>

To get the text inside the global variable, I used the following regex:

re.search("input: \[.*?\]", script_element.string).group(0)

which returns:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

"input: [162,13,'Crystal Palace','Arsenal','05/08/2022 20:00:00','05/08/2022 00:00:00',6,'FT','0 : 1','0 : 2',,,'0 : 2','England','England']"

I am having some trouble parsing this array because of the empty elements (literal_eval does not work).

Any idea on how to accomplish this? Is there an easier way to do it?

Regards

>Solution :

One solution could be insert None between the empty , and then parse it:

import re
from ast import literal_eval

data = re.search(r"input:\s*(.*)", s).group(1)  # <-- `s` is your string from the question
data = re.sub(r"(?<=,)\s*(?=,)", "None", data)
data = literal_eval(data)

print(data)

Prints:

[162, 13, 'Crystal Palace', 'Arsenal', '05/08/2022 20:00:00', '05/08/2022 00:00:00', 6, 'FT', '0 : 1', '0 : 2', None, None, '0 : 2', 'England', 'England']
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading