I’ve been making a key-value store that saves to disk as a personal project and I’ve been using a b-tree as my data structure, but I want to add large limits to key and value length like many other key-value stores such as redis.
How should large keys and values be stored within a b-tree when the sector size is as little as 512 bytes? If you allow larger sized keys and values, how many keys should you allow per node, and should I consider thinking about another data structure to store variable-sized data?
You can either define overflow pages to form nodes out of a linked list of pages, or you can refer to keys and values via pointers stored in b-tree leaf nodes. The pointers can refer to a linked list of pages or a special kind of sub-tree. You can store some inline content in the leaf node if this reduces wastage due to unfilled pages.
How many keys to allow per node when going for the overflow design? The least possible. The design doesn’t scale as the linked list gets larger. If for some reason you need to store very large values, you can see how this design might be quite expensive as you have to scan and skip over so many extra pages.
The pointer based approach scales better, but for it to be most effective for keys, as much of the key must be inlined as possible. Otherwise you always have to follow pointers when doing searches. You can potentially apply a pointer compression technique in which a common prefix is stored once. This allows more of the key to fit in the page, reducing the likelihood of following a pointer.