Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Are there any obvious bottlenecks in my MariaDB UPDATE?

I have a MariaDB table with 28 million rows. I need to update all the rows (locally) with a new value in a column. This is the code in Python to batch update:

update_query = "UPDATE table SET column = %s WHERE `index` = %s"
%time cursor.executemany(update_query, update_data)

column has an index, and index is the primary key.

When I use a batch of 100,000 rows, it takes 50 seconds to run the executemany(). I am not sure if this is very slow or normal. And if it’s slow, I don’t know where to start for speeding it up.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Unlike INSERT, executemany() can’t automatically batch UPDATE queries, so the bottleneck is performing 100,000 queries.

Instead, use INSERT INTO ... ON DUPLICATE KEY UPDATE .... This will be batched automatically.

update_query = 'INSERT INTO table (column, index) VALUES (%s, %s) ON DUPLICATE KEY UPDATE column = VALUES(column)'
%time cursor.executemany(update_query, update_data)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading