Hekaton and Durability

February 10, 2022

Tech paper by Microsoft (Hekaton: SQL Server’s Memory-Optimized OLTP Engine) says

Generating a log record only at transaction commit time is possible
because Hekaton does not use write-ahead logging (WAL) to force log
changes to durable storage before dirty data. Dirty data is never
written to durable storage. Furthermore, Hekaton tries to group
multiple log records into one large I/O; this is the basis for group
commit and also a significant source of efficiency for Hekaton commit
processing.

No doubts, avoiding WAL and grouping multiple log records into one large I/O can improve performance but I’m struggling to understand how Durability can be guaranteed with such "improvements".

Can someone explain how (if) Hekaton handles the situation when crash happens after commit and before changes are persistent in the log?

According to SQL Server team blog

Hekaton’s main memory structures do away with the overhead and
indirection of the storage optimized view while still providing the
full ACID properties expected of a database system. For example,
durability in Hekaton is achieved by streamlined logging and
checkpointing that uses only efficient sequential IO.

>Solution :

Can someone explain how (if) Hekaton handles the situation when crash happens after commit and before changes are persistent in the log?

Commit happens only after the log records are written, just like in traditional Write-Ahead Logging. But in Write-Ahead Logging the database changes are also written to disk after the log records, which requires that the in-memory data structures for the database and the on-disk data structures are mostly the same. Memory Optimized tables are not written to disk; instead they can be reconstructed from the logging and checkpointing during a crash recovery.

Using a data structure that can be efficiently written to and read from disk creates overhead since you can’t simply refer to a row through native memory pointers. Instead you have to find data by FileId:PageId:SlotId, or by an index key value that you traverse a BTree to locate. If you allow the in-memory format to be different from the on-disk format, you can refer to rows using pointers, and describe the row structures as C Structs, and generate data access programs that run as native-compiled C operating on pointers and structs.