What is SFST?
SFST is a toolbox for the implementation of morphological analysers
and other tools which are based on finite state transducer technology.
The SFST tools comprise
- a compiler which translates transducer programs into
- interactive and batch-mode analysis programs
- tools for comparing and printing transducers
- an efficient C++ transducer library
- freely available under
the GNU Public
- easy to learn for users who are familiar with grep,
sed, or Perl.
- efficient implementation in C++
- a wide range of transducer operations
- UTF-8 character coding
- weighted transducers (basic functionality only)
- Source code of the SFST tools
- version 1.4.7a (the replacement
operation now correctly works with alphabets that contain non-identity
mappings, problems with incompatible alphabets solved in fst-parse)
- version 1.4.6j (downward
replacement is now the exact opposite of upward replacement)
- version 1.4.6h (comments are now optionally allowed in the lexicon, faster fault-tolerant lookup)
- version 1.4.6a (Improvement of the efficiency of the minimisation and composition operations. Many thanks to Anssi Yli-Jyrä for his support!)
- version 1.4.4 (Bug related to multi-character symbols in the input was fixed.)
- version 1.4.3 (Optional replace operations have changed)
- version 1.4.2 (includes Hopcroft minimisation and other modifications which were jointly developed with the HFST team at Helsinki)
- version 1.3 (fst-print now produces a different output format which might affect the graphical viewers listed below)
- version 1.2
- A short
manual (included in the source code package)
on the implementation of computational morphologies (included in the
source code package)
SMOR, a German finite-state morphology which is based on SFST.
EMOR, an English finite-state morphology using SFST.
- A Debian package
for SFST (created by Francis Tyers)
- Software for finding potential errors in your SFST code (created by Eleonora Nagy)
Please cite the following publication if you want to refer to the SFST tools:
A Programming Language for Finite State Transducers,
Proceedings of the 5th International Workshop on Finite State
Methods in Natural Language Processing (FSMNLP 2005), Helsinki, Finland. (pdf)
Relations to other FST Toolkits
There are two projects which aim to extend the functionality of SFST
in various ways:
See also the contributions by other authors below.
- Anssi Yli-Jyrä's AFST toolkit is based on SFST
- The HFST
tookit developed by Krister Lindén, Kimmo Koskenniemi, and colleagues was implemented
on top of the three alternative FST libraries SFST, OpenFST, and foma.
- Alex Linke provided
to the Graphviz tool for the graphical output of transducers.
- Sebastian Nagel wrote
mode for editing transducer files and
program which converts SFST transducers to the Graphviz format
(similar to that of Alex Linke).
- Stefan Evert also sent me a
- Matthias Kistler provided a
highlighting mode for the VIM editor.
- Toni Arnold developed
- a Python interface for the
SFST library and
- Emores, an Empirical
MOrphological REaSoning engine for the automatic acquisition of lemmas
from a word list.
- Marius L. Jøhndal created a Ruby interface for the
- UIMA wrapper for SFST (developed
at the UKP Lab)
Please send comments, suggestions and bug reports to Helmut Schmid at FirstName.LastName@cis.uni-muenchen.de. (Insert the name into the email address.)