How to read in xml file in pandas – Error: https://www.un.org/securitycouncil/sites/www.un.org.securitycouncil/files/consolidated.xml

I need to read in the following xml file:

https://www.un.org/securitycouncil/sites/www.un.org.securitycouncil/files/consolidated.xml

I have tried this code:

import requests

from lxml import objectify

url = requests.get("https://www.un.org/securitycouncil/sites/www.un.org.securitycouncil/files/consolidated.xml")
parsed = objectify.parse((url))

When I run it, I get this error:

TypeError: cannot parse from ‘Response’

I don’t understand why.

Can someone help me please?

>Solution :

Here is one way of obtaining that data:

from bs4 import BeautifulSoup as bs
import requests
import pandas as pd

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}
url = 'https://www.un.org/securitycouncil/sites/www.un.org.securitycouncil/files/consolidated.xml'
soup = bs(requests.get(url, headers=headers).text, 'lxml')
df = pd.read_xml(str(soup), xpath='.//individual')
print(df)

Result in terminal:

    dataid  versionnum  first_name  second_name     third_name  un_list_type    reference_number    listed_on   comments1   designation     ...     individual_date_of_birth    individual_place_of_birth   individual_document     sort_key    sort_key_last_mod   name_original_script    fourth_name     gender  title   submitted_by
0   6908555     1   RI  WON HO  None    DPRK    KPi.033     2016-11-30  Ri Won Ho is a DPRK Ministry of State Security...   NaN     ...     NaN     NaN     NaN     NaN     NaN     None    None    None    NaN     None
1   6908570     1   CHANG   CHANG HA    None    DPRK    KPi.037     2016-11-30  None    NaN     ...     NaN     NaN     NaN     NaN     NaN     None    None    None    NaN     None
2   6908571     1   CHO     CHUN RYONG  None    DPRK    KPi.038     2016-11-30  None    NaN     ...     NaN     NaN     NaN     NaN     NaN     None    None    None    NaN     None
3   6908858     1   EMRAAN  ALI     None    Al-Qaida    QDi.430     2021-11-23  Senior member of Islamic State in Iraq and the...   NaN     ...     NaN     NaN     NaN     NaN     NaN     None    None    None    NaN     None
4   6908565     1   JO  YONG CHOL   None    DPRK    KPi.034     2016-11-30  Jo Yong Chol is a DPRK Ministry of State Secur...   NaN     ...     NaN     NaN     NaN     NaN     NaN     None    None    None    NaN     None
...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...
697     6908704     1   Ahmad   Oumar   Imhamad     Libya   LYi.023     2018-06-07  Listed pursuant to paragraphs 15 and 17 of res...   NaN     ...     NaN     NaN     NaN     NaN     NaN     احمد عمر امحمد الفيتوري     al-Fitouri  None    NaN     None
698     6908707     1   Abd     Al-Rahman   al-Milad    Libya   LYi.026     2018-06-07  Listed pursuant to paragraphs 15 and 17 of res...   NaN     ...     NaN     NaN     NaN     NaN     NaN     None    None    None    NaN     None
699     6908841     1   Amir    Muhammad Sa’id  Abdal-Rahman    Al-Qaida    QDi.426     2020-05-21  Leader of Islamic State in Iraq and the\n ...   NaN     ...     NaN     NaN     NaN     NaN     NaN     أمیر محمد سعید عبد\n ...    al-Salbi    None    NaN     None
700     2975510     1   FAIZULLAH   KHAN    NOORZAI     Taliban     TAi.153     2011-10-04  Prominent Taliban financier. As of mid-2009, s...   NaN     ...     NaN     NaN     NaN     NaN     NaN     فیض الله خان نورزی  na  None    NaN     None
701     2959427     1   SAID JAN    ‘ABD AL-SALAM   None    Al-Qaida    QDi.289     2011-02-09  In approximately 2005, ran a "basic training" ...   NaN     ...     NaN     NaN     NaN     NaN     NaN     سعید جان عبد السلام     None    None    NaN     None

702 rows × 25 columns

Also check pandas documentation for reading XML documents.

Leave a Reply