How to read in an xml file in Python without node

I am trying to read in in Python this file

https://www.europarl.europa.eu/meps/en/full-list/xml/a

And I have used this code

from bs4 import BeautifulSoup as bs
import requests
import pandas as pd

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}
url = 'https://www.europarl.europa.eu/meps/en/full-list/xml/a'
soup = bs(requests.get(url, headers=headers).text, 'lxml')
df = pd.read_xml(str(soup))
print(df)

But, the result looks wrong.

   meps
0   NaN

Can anyone help me please?

>Solution :

No need to use intermediate libraries, read_xml can handle a URL:

df = pd.read_xml('https://www.europarl.europa.eu/meps/en/full-list/xml/a')

If you need to pass custom header, use storage_options:

url = 'https://www.europarl.europa.eu/meps/en/full-list/xml/a'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}

df = pd.read_xml(url, storage_options=headers)

Output:

              fullName   country                                     politicalGroup      id                         nationalPoliticalGroup
0  Magdalena ADAMOWICZ    Poland  Group of the European People's Party (Christia...  197490                                    Independent
1          Asim ADEMOV  Bulgaria  Group of the European People's Party (Christia...  189525  Citizens for European Development of Bulgaria
2    Isabella ADINOLFI     Italy  Group of the European People's Party (Christia...  124831                                   Forza Italia
3      Matteo ADINOLFI     Italy                       Identity and Democracy Group  197826                                           Lega
4    Alex AGIUS SALIBA     Malta  Group of the Progressive Alliance of Socialist...  197403                               Partit Laburista
...

Leave a Reply