Wavelet Tree Construction on the GPU

  • Bit Vectors are one of the most basic data structures in computer science. Operations on bit vectors include rank and select queries.

     

    - rank_1(k) returns the number of 1-bits up to position k in the bit vector.
    - select_1(k) returns the position at which the k-th 1-bit is stored.

     

    One of the many applications of bit vectors with rank and select support are Wavelet Trees. A Wavelet Tree can store a compressed sequence of integers and answer rank and select queries on those integers. This can be used, for example, in bioinformatics. Wavelet trees can be constructed in parallel by splitting the input sequence, constructing independent wavelet trees, and merging the results. The idea of merging is simple but it is rather slow because it requires a large amount of non-contiguous data to be copied. GPUs support fast memory access, which could make this parallel wavelet tree construction more efficient.

    There seem to be efficient implementations of neither rank and select data structures, nor wavelet tree construction on GPUs. The Nvidia nvbio library provides an implementation but does not use state of the art algorithms.

     

    Tasks:
    - Design, develop, and benchmark a parallel construction algorithm for bit vector rank and select data structures on GPUs.
    - Use the bit vectors to design, develop, and benchmark a state of the art parallel construction algorithm for wavelet tree construction on GPUs.
    - Maybe contribute back to the nvbio library?

    Requirements:
    - Interest in space efficient data structures
    - Knowledge of C++ and CUDA

    References:
    - Grossi, Guptat, Vittert. High-Order Entropy-Compressed Text Indexes
    - Wavelet Tree implementierung in the nvbio library: https://nvlabs.github.io/nvbio/waveletfm_page.html
    - Fuentes-Sepulveda, Elejalde, Ferres, Seco. Efficient Wavelet Tree Construction and Querying for Multicore Architectures
    - Dinklage, Ellert, Fischer, Kurpicz, Löbel. Practical Wavelet Tree Construction