Abstract
We present work-optimal PRAM algorithms for Burrows–Wheeler compression and decompression of strings over a constant alphabet. For a string of length n , the depth of the compression algorithm is O(log2 n, and the depth of the corresponding decompression algorithm is O(logn) . These appear to be the first polylogarithmic-time work-optimal parallel algorithms for any standard lossless compression scheme. The algorithms for the individual stages of compression and decompression may also be of independent interest: (1) a novel - O(logn) time, O(n)-work PRAM algorithm for Huffman decoding; (2) original insights into the stages of the BW compression and decompression problems, bringing out parallelism that was not readily apparent, allowing them to be mapped to elementary parallel routines that have - O(logn) time, O(n)-work solutions, such as: (i) prefix-sums problems with an appropriately-defined associative binary operator for several stages, and (ii) list ranking for the final stage of decompression. Follow-up empirical work suggests potential for considerable practical speedups on a PRAM-driven many-core architecture, against a backdrop of negative contemporary results on common commercial platforms