I am trying to parse this javascript element using BS4.
I want to get that input array into a usable format.
<script type="text/javascript">
require.config.params['matchheader'] = {
input: [162,13,'Crystal Palace','Arsenal','05/08/2022 20:00:00','05/08/2022 00:00:00',6,'FT','0 : 1','0 : 2',,,'0 : 2','England','England']
,
matchId: 1640674
};
</script>
To get the text inside the global variable, I used the following regex:
re.search("input: \[.*?\]", script_element.string).group(0)
which returns:
"input: [162,13,'Crystal Palace','Arsenal','05/08/2022 20:00:00','05/08/2022 00:00:00',6,'FT','0 : 1','0 : 2',,,'0 : 2','England','England']"
I am having some trouble parsing this array because of the empty elements (literal_eval does not work).
Any idea on how to accomplish this? Is there an easier way to do it?
Regards
>Solution :
One solution could be insert None between the empty , and then parse it:
import re
from ast import literal_eval
data = re.search(r"input:\s*(.*)", s).group(1) # <-- `s` is your string from the question
data = re.sub(r"(?<=,)\s*(?=,)", "None", data)
data = literal_eval(data)
print(data)
Prints:
[162, 13, 'Crystal Palace', 'Arsenal', '05/08/2022 20:00:00', '05/08/2022 00:00:00', 6, 'FT', '0 : 1', '0 : 2', None, None, '0 : 2', 'England', 'England']