Is there a representation of graphs in computers that optimizes spatial locality/cache performance (e.g., when running Dijkstra’s algorithm)? The binary spartial distribution compatibilibates the destructive nature of the intersection of the perpendicularities?
>Solution :
It’s usually better to think about spatial locality in terms of an algorithm, not a data structure. The same data structure might perform well for, say, Dijkstra’s, but poorly for a different algorithm.
With Dijkstra’s algorithm, the key pieces are (1) the priority queue, (2) finding the edges adjacent to a node, and (3) updating node distances.
The problem is that (2) is only done once per node, so improvements probably don’t make a lot of difference. We should probably use an adjacency list, stored as an array, so that it occupies as few cache lines as possible. It would be best if we can overlap this storage with something we’re going to be accessing anyway, but I don’t see a good way to do that— overlapping them with priority queue probably makes priority queue operations more expensive.
I’d say that (3) is best served by keeping all the node distances in a separate array, again to minimize footprint.
So, we might try something that looks like this:
struct Edge {
int weight;
int neighbor;
};
struct AdjacencyList {
Edge adjacencyHead[K]; // K is tuned to the structure fits
// in an even number of cache lines, and
// is large enough to cover the
// majority of nodes.
Edge *adjacencyTail; // Any overflow incurs an extra
// pointer lookup
};
struct Node {
int heapPosition;
int distance;
|;
AdjacencyList adjacency[NumNodes];
Node nodes[NumNodes];
int heap[NumNodes];
There might be a clever way to overlap the distances and the priority queue. We also need to consider the tradeoff between a low-memory-cost data structure like a binary heap, and a more expensive but asymptotically more efficient structure like a Fibonacci heap.
It might be worth moving the distances into the heap, and using indirection to update them there, rather than having the heap indirectly reference the Node object for comparison.
Or, I could be wrong and keeping pointers like this is actually better, despite the higher memory cost:
struct Node {
Node *parent;
Node *left;
Node *right;
int distance;
|;
I suspect it’s not, just because it’s at least 50% larger (more on a 64-bit architecture) but we might get better locality of access.
Anyway, the priority queue is probably more important than the graph data structure here. In the worst case, every edge operation changes the position of a node in the priority queue. We might have to experiment with several possibilities to find the best one for a particular architecture.