KIT – ITI Algorithm Engineering – Group Members - Scalable Decentralized Fault-Tolerant MapReduce for Iterative Algorithms

Links:PDF
The increase in parallel compute nodes in a high-performance computing (HPC) system is followed by an increase in hardware failures for applications running on millions of nodes. Therefore, future large-scale parallelized applications need to expect, detect and handle compute node failure. We provide a decentralized fault-tolerant MapReduce framework for iterative al- gorithms designed to handle a HPC node failure. Our framework follows the bulk synchronous parallel model (BSP) by using random static load balancing during the shuffle phase. We propose a fault-tolerance mechanism, which exchanges additional self-messages during the shuffle phase and saves them in memory. In contrast to previous approaches, we do not save checkpoints on a fault-tolerant file system. Furthermore, we provide a purely MPI based MapReduce framework, as well as a hybrid parallelization using multiple OpenMP threads per process. Moreover, we prove the expected runtime of the different MapReduce phases and the fault- tolerance mechanism. Finally, we implement the word count, page rank, connected component, and R-Mat algorithms in our framework and perform runtime experiments. We can observe super linear speedups for the save self-message phase. The other phases of our framework, except the shuffle phase have optimal phases if the input is large enough compared to the number of processes and the bottleneck workload.