Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Trying get a table from a website (ValueError: If using all scalar values, you must pass an index)

I’m trying to make a function that automatically takes a table from a website(Wikipedia) cleans it a bit and than displays it, everything worked well with my first 2 tables but the third one is giving me some troubles.

This is the code to define the function:

def createTable(url, match):
    data= pd.read_html(url, match= match)
    name= data[0]["Name"]
    origin= data[0]["Origin"]
    type_= data[0]["Type"]
    number= data[0]["Number"]
    df= pd.DataFrame({"Name": name, "Origin": origin, "Type": type_, "Number": number})
    df.replace("?", np.nan, inplace=True)
    df['Number']= df['Number'].replace(to_replace={r"\(.*\)": "", r"\[.*\]": ""}, regex=True)
    return df

and this is the function at work:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df_avIT= pd.DataFrame()
df_avIT= createTable("https://en.wikipedia.org/wiki/List_of_equipment_of_the_Italian_Army",
            "125 To be upgraded and remain in service until 2035")
df_avUK= pd.DataFrame()
df_avUK= createTable("https://en.wikipedia.org/wiki/List_of_equipment_of_the_British_Army",
            "Challenger 2")
df_avFR= pd.DataFrame()
df_avFR= createTable("https://en.wikipedia.org/wiki/List_of_equipment_of_the_French_Army",
            "AMX Leclerc")

As I said the first 2 give me no problem at all but when I tried on the third it returns, ValueError: If using all scalar values, you must pass an index.
I know well the code isn’t great I’m trying to improve it but this problem is stopping me and I can’t find a valid solution, even though I scearched for similar problem to mine in various forums.
(I’m sorry if my English is bad, if you didn’t understand something tell me I’m gonna try to explain more).

>Solution :

Your script does not consistently yield Series for name/origin/type_/number, you sometimes have DataFrames, you can try to squeeze:

name= data[0]["Name"].squeeze()
origin= data[0]["Origin"].squeeze()
type_= data[0]["Type"].squeeze()
number= data[0]["Number"].squeeze()

Side note: df_avIT = pd.DataFrame() is useless, you don’t need to initialize empty DataFrames since the variable will be overwritten by df_avIT = createTable(...)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading