Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Hekaton and Durability

Tech paper by Microsoft (Hekaton: SQL Server’s Memory-Optimized OLTP Engine) says

Generating a log record only at transaction commit time is possible
because Hekaton does not use write-ahead logging (WAL) to force log
changes to durable storage before dirty data. Dirty data is never
written to durable storage. Furthermore, Hekaton tries to group
multiple log records into one large I/O; this is the basis for group
commit and also a significant source of efficiency for Hekaton commit
processing.

No doubts, avoiding WAL and grouping multiple log records into one large I/O can improve performance but I’m struggling to understand how Durability can be guaranteed with such "improvements".

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Can someone explain how (if) Hekaton handles the situation when crash happens after commit and before changes are persistent in the log?

According to SQL Server team blog

Hekaton’s main memory structures do away with the overhead and
indirection of the storage optimized view while still providing the
full ACID properties
expected of a database system. For example,
durability in Hekaton is achieved by streamlined logging and
checkpointing that uses only efficient sequential IO.

>Solution :

Can someone explain how (if) Hekaton handles the situation when crash happens after commit and before changes are persistent in the log?

Commit happens only after the log records are written, just like in traditional Write-Ahead Logging. But in Write-Ahead Logging the database changes are also written to disk after the log records, which requires that the in-memory data structures for the database and the on-disk data structures are mostly the same. Memory Optimized tables are not written to disk; instead they can be reconstructed from the logging and checkpointing during a crash recovery.

Using a data structure that can be efficiently written to and read from disk creates overhead since you can’t simply refer to a row through native memory pointers. Instead you have to find data by FileId:PageId:SlotId, or by an index key value that you traverse a BTree to locate. If you allow the in-memory format to be different from the on-disk format, you can refer to rows using pointers, and describe the row structures as C Structs, and generate data access programs that run as native-compiled C operating on pointers and structs.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading