ShockHash Minimal Perfect Hash Functions on the GPU

  • We have recently developed a new perfect hash function called ShockHash.

    A minimal perfect hash function (MPHF) maps a set S of n keys to the first n integers without collisions. There is a lower bound of about n*log(e) bits of space needed to represent an MPHF. A matching upper boundis obtained using the brute-force algorithm that triesrandom hash functions until stumbling on an MPHF andstores that function’s seed. In expectation, about e^n seeds need to be tested. The most space-efficient previousalgorithms for constructing MPHFs all use such a brute-force approach as a basic building block. ShockHash uses two hash functions h0 and h1 , hoping for the existenceof a function f : S -> {0, 1} such that x -> h_{f (x)}(x) is an MPHF on S. In graph terminology, ShockHash generates n-edge random graphs until stumbling on a pseudoforest – a graph where each component contains as many edges as nodes. Using cuckoo hashing, ShockHash then derives an MPHF from the pseudoforest in linear time. It uses a 1-bit retrieval data structure to store f using about n bits. By carefully analyzing the probability that a random graph is a pseudoforest, we show that ShockHash needs to try only about (e/2)^n hash function seeds in expectation. This reduces the space for storing the seed by roughly n bits (maintaining the asymptotically optimal space consumption) and speeds up construction by almost a factor of 2^n compared to brute-force.


    Design, develop, and benchmark a parallel construction algorithm for ShockHash on the GPU in CUDA. Because some of the steps in ShockHash construction need data structures with rather irregular memory access, a hybrid CPU/GPU implementation is probably most efficient. A goal would be to break the previous best space of 1.489*n bits (lower bound is 1.442*n bits).


    - Interest in space efficient data structures
    - Knowledge of C++ and CUDA


    - ShockHash: Towards Optimal-Space Minimal Perfect Hashing Beyond Brute-Force