The KMPautomaton functor


Synopsis

signature KMP
functor KMPautomaton (LABEL) : KMP

A functor constructing a Knuth-Morris-Pratt search automaton to search for single pattern strings in a text.


Functor result interface

type t
datatype
  s
  = DummyPat of t
structure M : PPCFM
val kmp : t list -> s M.machine
val display : t list -> OS.Process.status
val find : t list -> t list -> int list

Description

type t
the type of letters taken from the input structure.

datatype
  s
  = DummyPat of t
a type distinguishing between pattern and other symbols.

kmp pat
returns the automaton for searching with pattern pat.

display pat
displays the automaton for searching with pattern pat, using _ for all symbols not in the pattern. Writes files kmp.tmp.dot and kmp.tmp.ps.

find pat txt
returns the list of initial positions of occurrences of pat in txt.


See Also

ppMatcher

Discussion

Using the dummy symbols, the restriction on the alphabet of simpleKMPautomaton is avoided: the search continues beyond the first letter of the text that is not in the pattern:

Example:
- structure IntKmp = KMPautomaton(IntLabels);
structure IntKmp : KMP?

- IntKmp.find [0,0] [0,0,0,1,0,1,0,0,0,1,0];
val it = [0,1,6,7] : int list
Note that the KMP-automaton for pat is just the minimal deterministic automaton for Alphabet*;pat. Therefore, the functor ppMatcher is more general than KMPautomaton. But note that ppMatcher reports hits by their end positions, so it can report hits for several patterns simultaneously.