I’m trying to make a function that automatically takes a table from a website(Wikipedia) cleans it a bit and than displays it, everything worked well with my first 2 tables but the third one is giving me some troubles.
This is the code to define the function:
def createTable(url, match):
data= pd.read_html(url, match= match)
name= data[0]["Name"]
origin= data[0]["Origin"]
type_= data[0]["Type"]
number= data[0]["Number"]
df= pd.DataFrame({"Name": name, "Origin": origin, "Type": type_, "Number": number})
df.replace("?", np.nan, inplace=True)
df['Number']= df['Number'].replace(to_replace={r"\(.*\)": "", r"\[.*\]": ""}, regex=True)
return df
and this is the function at work:
df_avIT= pd.DataFrame()
df_avIT= createTable("https://en.wikipedia.org/wiki/List_of_equipment_of_the_Italian_Army",
"125 To be upgraded and remain in service until 2035")
df_avUK= pd.DataFrame()
df_avUK= createTable("https://en.wikipedia.org/wiki/List_of_equipment_of_the_British_Army",
"Challenger 2")
df_avFR= pd.DataFrame()
df_avFR= createTable("https://en.wikipedia.org/wiki/List_of_equipment_of_the_French_Army",
"AMX Leclerc")
As I said the first 2 give me no problem at all but when I tried on the third it returns, ValueError: If using all scalar values, you must pass an index.
I know well the code isn’t great I’m trying to improve it but this problem is stopping me and I can’t find a valid solution, even though I scearched for similar problem to mine in various forums.
(I’m sorry if my English is bad, if you didn’t understand something tell me I’m gonna try to explain more).
>Solution :
Your script does not consistently yield Series for name/origin/type_/number, you sometimes have DataFrames, you can try to squeeze:
name= data[0]["Name"].squeeze()
origin= data[0]["Origin"].squeeze()
type_= data[0]["Type"].squeeze()
number= data[0]["Number"].squeeze()
Side note: df_avIT = pd.DataFrame() is useless, you don’t need to initialize empty DataFrames since the variable will be overwritten by df_avIT = createTable(...)