Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to import multiple records with MERGE function to an oracle DB via Python

I am trying to import data from a .csv file into an Oracle DB using Python. So far it works fine if the .csv file contains 10 records. If I increase the number of records in the .csv file to 1.000.000, the script takes far too long and does not end even after an hour.

Can anyone tell me how I can optimise my source code?

King Regards

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Jegor

...
sql_insert = """Merge into TEST_TABLE a 
                Using (Select
                           :ID as ID,
                           :COUNTRY as COUNTRY ,
                           :DATE as DATE
                      From Dual) src
                on src.ID = a.ID
                when matched then
                      update set
                           a.COUNTRY = src.COUNTRY,
                           a.DATE = src.DATE
                when not matched then
                       Insert (a.ID, a.COUNTRY, a.DATE)
                       Values (src.ID, src.COUNTRY, src.DATE)"""

# Get S3-File
obj = s3.Object(CDH_S3_Bucket, CDH_Path + '/' + s3_filename)
body = obj.get()['Body'].read().decode('utf-8').splitlines()

# ----------------------------------------------------------
csv_reader = csv.reader(body, delimiter=',')
headings = next(csv_reader)
for line in csv_reader:
    data.append(line)

if data:
    cursor.executemany(sql_insert, data)
    connection.commit()

cursor.close()
...

>Solution :

A merge is meant to modify one table based on the data in another table. It is not intended for single-row processing from the client like this. The proper design would be to use a normal bulk-bind insert to load a work table and then you can do a single merge execution to sync the target table with the work table.

Also, when you do use a merge (appropriately), you don’t want to use the where or delete subclauses within the merge when matched then update... clause. For merges, the where clause is different than in other SQL statements: it is a subprogram within the update program which means you already pay the penalty of CR block reads and other concurrency mechanisms, including redo, even if the where clause cancels the update of a row. You want to filter out unchanged rows within the using clause instead (which means pre-joining to the target table within the using clause and filter out unchanged rows there – you can then emit the target table’s ROWID for optimal matching in the outer merge block; but the main benefit is filtering before the DML operation on the target).

The delete subclause also fires after the update is processed and only on the updated row – if it’s being updated, it’s because it’s coming in your new data and it won’t be old by definition, unless you are getting old data in your files. To delete old records you have to use a totally separate archiving SQL that is not the same as your loading SQL.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading