The heap is a classical data structure that puts a complete binary (or d-ary for the generalized version) tree into a contiguous array, storing the elements in breadth-first traversal order. In this way, all elements from the same level of the tree are stored contiguous one after the other.

I'm implementing a data structure which, under the hood, is a complete balanced tree of fixed degree d, and I want to store the tree in a contiguous form to free the space of node pointers. So I thought of putting the nodes into the breadth-first order used in heaps, but then I'm worried about cache performance of a typical search from the root down to a leaf, since at each level l, I jump over a lot of elements.

Is there a way to obtain compact contiguous representation of a d-ary complete tree, based on depth-first order instead?

This way, the nodes touched during a search of a leaf seem to me more likely to be found closer to each other. The problem then is how to retrieve the index of the parent and children of a node, but also I'm wondering which operations on the tree are in general efficient in this setting.

I'm implementing this thing in C++, in case it matters at all.

# Best How To :

For simplicity, I'm going to limit my discussion to binary trees, but what I say holds true for n-ary trees, too.

The reason heaps (and trees in general) are stored in arrays breadth-first is because it's much easier to add and remove items that way: to grow and shrink the tree. If you're storing depth-first, then either the tree has to be allocated at its maximum expected size, or you have to do a lot of moving items around when you add levels.

But if you know that you're going to have a complete, balanced, n-ary tree, then the choice of BFS or DFS representation is largely a matter of style. There isn't any particular benefit to one over the other in terms of memory performance. In one representation (DFS) you take the cache misses up front, and in the other case (BFS) you take the cache misses at the end.

Consider a binary tree with 20 levels (i.e. 2^20 - 1 items) that contains the numbers from 0 to (2^20 - 1). Each node occupies four bytes (the size of an integer).

With BFS, you incur a cache miss when you get the first block of the tree. But then you have the first four levels of the tree in cache. So your next three queries are *guaranteed* to be in cache. After that, you're guaranteed to have a cache miss when the node index is greater than 15, because the left child is at `x*2 + 1`

which will be at least 16 positions (64 bytes) away from the parent.

With DFS, you incur a cache miss when you read the first block of the tree. As long as the number you're searching for is in the left subtree of the current node, you're guaranteed not to get a cache miss for the first 15 levels (i.e. you continually go left). But any branch that goes right will incur a cache miss until you get down to three levels above the leaves. At that point, the entire subtree will fit into the cache and your remaining queries incur no cache misses.

With BFS, the number of cache misses is directly proportional to the number of levels you have to search. With DFS, the number of cache misses is proportional to the path taken through the tree and the number of levels you have to search. But *on average*, the number of cache misses you incur when searching for an item will be the same for DFS as for BFS.

And the math for computing the node positions is easier for BFS than it is for DFS, especially when you want to find the parent of a particular node.