Faster Longest Common Extension Queries in Strings over General Alphabets

Abstract

Longest common extension queries (often called longest common prefix queries) constitute a fundamental building block in multiple string algorithms, for example computing runs and approximate pattern matching. We show that a sequence of q LCE queries for a string of size n over a general ordered alphabet can be realized in O(qloglogn+nlogn) time making only O(q+n) symbol comparisons. Consequently, all runs in a string over a general ordered alphabets can be computed in O(nloglogn) time making O(n) symbol comparisons. Our results improve upon a solution by Kosolobov (Information Processing Letters, 2016), who designed an algorithm with O(nlog2/3n) running time and conjectured that O(n) time is possible. Our paper makes a significant progress towards resolving this conjecture. Our techniques extend to the case of general unordered alphabets, when the time increases to O(qlogn+nlogn). The main tools are difference covers and a variant of the disjoint-sets data structure by La Poutré (SODA 1990).

Publication
CPM pp:5:1-5:13
Date