Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract data from array – Python

I am using the unleashed_py library to extract Unleashed data.

The sample of the output is as below where there could be several items in the invoice:

[{
 'OrderNumber': 'SO-00000742',
  'QuoteNumber': None,
  'InvoiceDate': '/Date(1658496322067)/',
  'InvoiceLines': [{'LineNumber': 1,
    'LineType': None},
   {'LineNumber': 2,
    'LineType': None}],
  'Guid': '8f6b89da-1e6e-42288a24-902a-038041e04f06',
  'LastModifiedOn': '/Date(1658496322221)/'}]

I need to get a df:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

enter image description here

If I run the below script, the invoice lines just get appended with the common fields such as ordernumber, quotenumber, invoicedate, guide, and lastmodifiedon not getting repeated.

order_number = []
quote_number = []
invoice_date = []
invoicelines = []
invoice_line_number = []
invoice_line_type = []
guid = []
last_modified = []


for item in df:
    order_number.append(item.get('OrderNumber'))
    quote_number.append(item.get('QuoteNumber'))
    invoice_date.append(item.get('InvoiceDate'))
    guid.append(item.get('Guid'))
    last_modified.append(item.get('LastModifiedOn'))
    lines = item.get('InvoiceLines')
    for item_sub_2 in lines:
        invoice_line_number.append('LineNumber')
        invoice_line_type.append('LineType')

df_order_number = pd.DataFrame(order_number)
df_quote_number = pd.DataFrame(quote_number)
df_invoice_date = pd.DataFrame(invoice_date)
df_invoice_line_number  = pd.DataFrame(invoice_line_number)
df_invoice_line_type = pd.DataFrame(invoice_line_type)
df_guid = pd.DataFrame(guid)
df_last_modified = pd.DataFrame(last_modified)

df_row = pd.concat([
df_order_number,
df_quote_number,
df_invoice_date,
df_invoice_line_number,
df_invoice_line_type,
df_guid,
df_last_modified
], axis = 1)

What am I doing wrong?

>Solution :

You don’t need to iterate, just create the dataframe out of the list of dictionaries you have, then explode InvoiceLines columns then apply pd.Series and join it with the original dataframe:

data = [{
    'OrderNumber': 'SO-00000742',
    'QuoteNumber': None,
    'InvoiceDate': '/Date(1658496322067)/',
    'InvoiceLines': [{'LineNumber': 1,
                      'LineType': None},
                     {'LineNumber': 2,
                      'LineType': None}],
    'Guid': '8f6b89da-1e6e-42288a24-902a-038041e04f06',
    'LastModifiedOn': '/Date(1658496322221)/'}]


df=pd.DataFrame(data).explode('InvoiceLines')
out=pd.concat([df['InvoiceLines'].apply(pd.Series),
           df.drop(columns=['InvoiceLines'])],
          axis=1)

OUTPUT:

#out

   LineNumber  LineType  OrderNumber QuoteNumber            InvoiceDate  \
0         1.0       NaN  SO-00000742        None  /Date(1658496322067)/   
0         2.0       NaN  SO-00000742        None  /Date(1658496322067)/   
                                       Guid         LastModifiedOn  
0  8f6b89da-1e6e-42288a24-902a-038041e04f06  /Date(1658496322221)/  
0  8f6b89da-1e6e-42288a24-902a-038041e04f06  /Date(1658496322221)/  

I’m leaving the date conversion and column renames for you cause I believe you can do that yourself.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading