I am working on a problem where I need to make use of segment trees. However, my underlying array in the segment tree in this problem has 1e9 possible locations, but it can only conain1e5 elements.Obviously, I can’t store all 2e9 elements in my segtree, so how can I change my current implementation? (My current implementation only has two methods, modify, and query)
My segment tree implementation currently:
int N;
int seg[2 * MAXN];
void modify(int p, int val) {
for (seg[p += N] = val; p > 0; p >>= 1)
seg[p >> 1] = seg[p] + seg[p ^ 1];
}
int query(int l, int r) {
int res = 0;
for (l += N, r += N; l < r; l >>= 1, r >>= 1) {
if (l & 1)
res += seg[l++];
if (r & 1)
res += seg[--r];
}
return res;
}
>Solution :
However, say your underlying array had 1e9 possible locations, but it only contained 1e5 elements. Obviously, you can’t store all 2e9 elements in your segtree, so what should you do? Here’s one solution, replace the array with a hash table. However, as adamant mentions, unordered_map has too much overhead. We’ll be benchmarking against the dynamic segtree provided here. I’ll also be using a custom hash function. Benchmarking it with 1e5 random insertions and 1e5 random queries,
pointer: 0.171485
unordered_map: 2.0646
Wow. The unordered_map is nearly 12x slower. That’s not really feasible for a lot of contests. What if we replace it with a policy hash table, though?
pointer: 0.202186
policy hash table: 0.384312
Only a 2x decrease in speed. That’s already very feasible. However, one might notice that since maps in C++ create elements if you try to access a key that doesn’t exist, we’re creating a lot of useless elements. Thus, we can simply wrap a check to make sure the element is in the array before we try to access it
gp_hash_table<ll, ll, chash> seg;
ll get(ll x) { return (seg.find(x) == seg.end()) ? 0 : seg[x]; }
void modify(ll p, ll val) {
for (seg[p += N] = val; p > 0; p >>= 1) {
seg[p >> 1] = get(p) + get(p ^ 1);
}
}
ll query(ll l, ll r) {
ll res = 0;
for (l += N, r += N; l < r; l >>= 1, r >>= 1) {
if (l & 1)
res += get(l++);
if (r & 1)
res += get(--r);
}
return res;
}
Results (averaged over twenty runs):
2e5 insertions and 2e5 queries
pointer: 0.44085
policy hash table: 0.57878
1e5 insertions and 1e5 queries
pointer: 0.19855
policy hash table: 0.29467
1e4 insertions and 1e4 queries
pointer: 0.014
policy hash table: 0.027
So while we’re nearly twice as slow with 1e4 elements and 1e4 queries, we’re actually only 30% slower with 2e5 insertions and 2e5 queries.
One more thing. While I’m giving numbers like "30% slower", that’s a little bit misleading. If we break down the numbers between insertion/querying, we see this:
pointer: 0.41625
policy hash table: 0.15627
Insertions:
pointer: 0.1367
policy hash table: 0.42619
1e4 insertions and 1e4 queries
pointer : 0.094
policy hash table: 0.007
Insertions:
pointer : 0.0045
policy hash table: 0.0191
So as we see from this more granular breakdown, the Policy Hash Table implementation is actually ~3x faster at querying than the custom implementation, while the custom implementation is roughly ~3x faster at inserting elements.
TL;DR: Using policy hash tables is an extremely easy and fairly efficient method of implementing dynamic segtrees.