Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Unable to convert scraped list of dictionaries to a Pandas DataFrame

I am trying to scrape tables from the following website:

https://www.rotowire.com/betting/mlb/player-props.php

Data for each table is within a script on the site starting with data: [{ ... }]. This can be pulled using a combination of BeautifulSoup and regex. I cannot seem to convert this data into a Pandas DataFrame and it only reads it in as a single row. The data is read in as a list of dictionaries and looks as follows:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

[{"gameID":"2513620","playerID":"13902","firstName":"Mark"}, 
 {"gameID":"2512064","playerID":"12450","firstName":"Mike"},
 {"gameID":"2513053","playerID":"14261","firstName":"Will"}]

This should work with pd.DataFrame(df), but it does not seem to read correctly when scraped from the site.

I have tried the following:

from bs4 import BeautifulSoup
import pandas as pd
import requests
import re
import json

url  = 'https://www.rotowire.com/betting/mlb/player-props.php'
page = requests.get(url, verify=False)

soup = BeautifulSoup(page.text)

# Read first table
script = str(soup.findAll('script')[4])
data   = re.findall(r'data: \[(.*?)\]', script)

df = pd.DataFrame(data)
                                                   0
0  {"gameID":"2513620","playerID":"13902","firstN...

>Solution :

Try:

from bs4 import BeautifulSoup
import pandas as pd
import requests
import re
import json

from requests.packages import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

url  = 'https://www.rotowire.com/betting/mlb/player-props.php'
page = requests.get(url, verify=False)

soup = BeautifulSoup(page.text, 'html.parser')

# Read first table
script = str(soup.findAll('script')[4])
data   = re.search(r'data: (\[.*?\])', script)

df = pd.DataFrame(json.loads(data.group(1)))
print(df.head())

Prints:

    gameID playerID firstName lastName             name team   opp                                                                  logo                                      playerLink draftkings_onehit draftkings_twohit draftkings_onehomerun draftkings_onerbi draftkings_onesb draftkings_pitchwin fanduel_onehit fanduel_twohit fanduel_onehomerun fanduel_onerbi fanduel_onesb fanduel_pitchwin mgm_onehit mgm_twohit mgm_onehomerun mgm_onerbi mgm_onesb mgm_pitchwin pointsbet_onehit pointsbet_twohit pointsbet_onehomerun pointsbet_onerbi pointsbet_onesb pointsbet_pitchwin
0  2513620    13902      Mark  Mathias     Mark Mathias  PIT   HOU  https://content.rotowire.com/images/teamlogo/baseball/100PIT.png?v=6     /betting/mlb/player/mark-mathias-odds-13902               115                                    1000              -115                                                -115            600                                                                        None       None           None       None      None         None             None             None                 None             None            None               None
1  2512064    12450      Mike   Zunino      Mike Zunino  CLE   NYY  https://content.rotowire.com/images/teamlogo/baseball/100CLE.png?v=6      /betting/mlb/player/mike-zunino-odds-12450               115                                     700              -120                                                -105            600                                                                        None       None           None       None      None         None             None             None                 None             None            None               None
2  2513053    14261      Will   Benson      Will Benson  CIN  @ATL  https://content.rotowire.com/images/teamlogo/baseball/100CIN.png?v=6      /betting/mlb/player/will-benson-odds-14261               110                                     900              -120                                                -140            410                                                                        None       None           None       None      None         None             None             None                 None             None            None               None
3  2513620    15016     Jason    Delay      Jason Delay  PIT   HOU  https://content.rotowire.com/images/teamlogo/baseball/100PIT.png?v=6      /betting/mlb/player/jason-delay-odds-15016               110                                    1000              -115                                                -135            460                                                                        None       None           None       None      None         None             None             None                 None             None            None               None
4  2514026    15672   Geraldo  Perdomo  Geraldo Perdomo  ARI   MIL  https://content.rotowire.com/images/teamlogo/baseball/100ARI.png?v=6  /betting/mlb/player/geraldo-perdomo-odds-15672               110                                                      -135                                                -155            380                                                                        None       None           None       None      None         None             None             None                 None             None            None               None
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading