I have an xml that looks something like that. (It’s longer so did not paste the whole thing) I am trying to read the mentioned file with read_xml, but it is just printing a table full of NaN Values. how can I resolve it? (Newby in terms of XML files)
import numpy as np
import pandas as pd
from tkinter import filedialog as fd
filename = fd.askopenfilename()
df = pd.read_xml('{}'.format(filename), )
print(df)
<ScheduleMessage xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" DtdVersion="3" DtdRelease="3">
<MessageIdentification v="20211022_DA_POS_65XGENEXMARKET0I" />
<MessageVersion v="1" />
<MessageType v="A01" />
<ProcessType v="A01" />
<ScheduleClassificationType v="A01" />
<SenderIdentification v="65XGENEXMARKET0I" codingScheme="A01" />
<SenderRole v="A01" />
<ReceiverIdentification v="10X1001C--00007L" codingScheme="A01" />
<ReceiverRole v="A04" />
<MessageDateTime v="2021-10-21T10:02:02Z" />
<ScheduleTimeInterval v="2021-10-21T22:00Z/2021-10-22T22:00Z" />
<ScheduleTimeSeries>
<SendersTimeSeriesIdentification v="S_10Y1001A1001B012_65YBG-ENERGRIDDB" />
<SendersTimeSeriesVersion v="1" />
<BusinessType v="A02" />
<Product v="8716867000016" />
<ObjectAggregation v="A03" />
<InArea v="10Y1001A1001B012" codingScheme="A01" />
<OutArea v="10Y1001A1001B012" codingScheme="A01" />
<InParty v="65YBGGENEX000002" codingScheme="A01" />
<OutParty v="65YBG-ENERGRIDDB" codingScheme="A01" />
<MeasurementUnit v="MAW" />
<Period>
<TimeInterval v="2021-10-21T22:00Z/2021-10-22T22:00Z" />
<Resolution v="PT1H" />
<Interval>
<Pos v="1" />
<Qty v="0" />
</Interval>
>Solution :
I would start by validating XML file. Based on the code you have shared, it seems like that this is not a valid XML file.
In order to read XML file through pandas and convert into csv or excel file, you can use pandas_read_xml library:
import pandas_read_xml as pdx
And then you can read the file via below code line:
df = pdx.read_xml('path-to-your-XML-file.xml')
You also need to flatten after reading XML file:
df = pdx.fully_flatten(df)