ShockHash Minimal Perfect Hash Functions on the GPU
 Subject:Perfect Hashing
 Type:Masterarbeit
 Supervisor:

We have recently developed a new perfect hash function called ShockHash.
A minimal perfect hash function (MPHF) maps a set S of n keys to the first n integers without collisions. There is a lower bound of about n*log(e) bits of space needed to represent an MPHF. A matching upper boundis obtained using the bruteforce algorithm that triesrandom hash functions until stumbling on an MPHF andstores that function’s seed. In expectation, about e^n seeds need to be tested. The most spaceefficient previousalgorithms for constructing MPHFs all use such a bruteforce approach as a basic building block. ShockHash uses two hash functions h0 and h1 , hoping for the existenceof a function f : S > {0, 1} such that x > h_{f (x)}(x) is an MPHF on S. In graph terminology, ShockHash generates nedge random graphs until stumbling on a pseudoforest – a graph where each component contains as many edges as nodes. Using cuckoo hashing, ShockHash then derives an MPHF from the pseudoforest in linear time. It uses a 1bit retrieval data structure to store f using about n bits. By carefully analyzing the probability that a random graph is a pseudoforest, we show that ShockHash needs to try only about (e/2)^n hash function seeds in expectation. This reduces the space for storing the seed by roughly n bits (maintaining the asymptotically optimal space consumption) and speeds up construction by almost a factor of 2^n compared to bruteforce.
Task
Design, develop, and benchmark a parallel construction algorithm for ShockHash on the GPU in CUDA. Because some of the steps in ShockHash construction need data structures with rather irregular memory access, a hybrid CPU/GPU implementation is probably most efficient. A goal would be to break the previous best space of 1.489*n bits (lower bound is 1.442*n bits).
Requirements
 Interest in space efficient data structures
 Knowledge of C++ and CUDAReferences
 ShockHash: Towards OptimalSpace Minimal Perfect Hashing Beyond BruteForce