In computer science, the two-way string-matching algorithm is a string-searching algorithm, discovered by Maxime Crochemore and Dominique Perrin in 1991. It takes a pattern of size m, called a “needle”, preprocesses it in linear time O(m), producing information that can then be used to search for the needle in any “haystack” string, taking only linear time O(n) with n being the haystack's length.
The two-way algorithm can be viewed as a combination of the forward-going Knuth–Morris–Pratt algorithm (KMP) and the backward-running Boyer–Moore string-search algorithm (BM). Like those two, the 2-way algorithm preprocesses the pattern to find partially repeating periods and computes “shifts” based on them, indicating what offset to “jump” to in the haystack when a given character is encountered.
Unlike BM and KMP, it uses only O(log m) additional space to store information about those partial repeats: the search pattern is split into two parts (its critical factorization), represented only by the position of that split. Being a number less than m, it can be represented in ⌈log₂ m⌉ bits. This is sometimes treated as "close enough to O(1) in practice", as the needle's size is limited by the size of addressable memory; the overhead is a number that can be stored in a single register, and treating it as O(1) is like treating the size of a loop counter as O(1) rather than log of the number of iterations. The actual matching operation performs at most 2n − m comparisons.
Breslauer later published two improved variants performing fewer comparisons, at the cost of storing additional data about the preprocessed needle:
The algorithm is considered fairly efficient in practice, being cache-friendly and using several operations that can be implemented in well-optimized subroutines. It is used by the C standard libraries glibc, newlib, and musl, to implement the memmem and strstr family of substring functions. As with most advanced string-search algorithms, the naïve implementation may be more efficient on small-enough instances; this is especially so if the needle isn't searched in multiple haystacks, which would amortize the preprocessing cost.