Combined Search and Encoding for Seeds, with an Application to Minimal Perfect Hashing
-
Autor:
Hans-Peter Lehmann, Peter Sanders, Stefan Walzer
- Quelle:
- Datum: February 2025
-
Randomised algorithms often employ methods that can fail and that are retried with independent randomness until they succeed. Randomised data structures therefore often store indices of successful attempts, called seeds. If n such seeds are required (e.g., for independent substructures) the standard approach is to compute for each i∈[n] the smallest successful seed S_i and store S⃗ =(S_1,…,S_n).
The central observation of this paper is that this is not space-optimal. We present a different algorithm that computes a sequence S⃗ ′=(S′_1,…,S′_n) of successful seeds such that the entropy of S′→ undercuts the entropy of S⃗ by Ω(n) bits in most cases. To achieve a memory consumption of OPT+εn, the expected number of inspected seeds increases by a factor of O(1/ε).
We demonstrate the usefulness of our findings with a novel construction for minimal perfect hash functions with space requirement (1+ε)OPT. The construction time is O(n/ε) while all previous approaches have construction times that increase exponentially with 1/ε. Our implementation beats the construction throughput of the state of the art by up to two orders of magnitude.