A 'finite state transducer' ('FST') is a
finite state machine with two tapes: an input tape and an output tape.
Contrast this with an ordinary
finite state automaton, which has a single tape. An automaton can be said to ''recognize'' a string if we view the content of its tape as input. In other words, the automaton computes a function that maps strings into the set {0,1}. Alternatively, we can say that an automaton ''generates'' strings, which means viewing its tape as an output tape. On this view, the automaton generates a
formal language, which is a set of strings. The two views of automata are equivalent: the function that the automaton computes is precisely the
indicator function of the set of strings it recognized. The class of languages generated by finite automata is known as the class of
regular languages.
The two tapes of a transducer are typically viewed as an input tape and an output tape. On this view, a transducer is said to ''transduce'' (i.e., translate) the contents of its input tape to its output tape, by accepting a string on its input tape and generating another string on its output tape. It may do so
nondeterministically and it may produce more than one output for each input string. A transducer may also produce no output for a given input string, in which case it is said to ''reject'' the input. In general, a transducer computes a
relation between two formal languages. The class of relations computed by finite state transducers is known as the class of
rational relations.
Finite State Transducers are typically used for
morphological analysis in
natural language processing research and applications.
Formal construction
Formally, a finite transducer ''T'' is a tuple (''Q'', Σ, Γ, ''I'', ''F'', δ) such that:
★ ''Q'' is a
finite set, the set of ''states'';
★ Σ is a finite set, called the ''input alphabet'';
★ Γ is a finite set, called the ''output alphabet'';
★ ''I'' is a
subset of ''Q'', the set of ''initial states'';
★ ''F'' is a subset of ''Q'', the set of ''final states''; and
★
(where ε is the
empty string) is the ''transition relation''.
We can view (''Q'', δ) as a labeled
directed graph, known as the ''transition graph'' of ''T'': the set of vertices is ''Q'', and
means that there is a labeled edge going from vertex ''q'' to vertex ''r''. We also say that ''a'' is the ''input label'' and ''b'' the ''output label'' of that edge.
NOTE: This definition of finite transducer is also called ''letter transducer'' (Roche and Schabes 1997); alternative definitions are possible, but can all be converted into transducers following this one.
Define the ''extended transition relation''
as the smallest set such that:
★
;
★
for all
; and
★ whenever
and
then
.
The extended transition relation is essentially the reflexive
transitive closure of the transition graph that has been augmented to take edge labels into account. The elements of
are known as ''paths''. The edge labels of a path are obtained by concatenating the edge labels of its constituent transitions in order.
The ''behavior'' of the transducer ''T'' is the rational relation [''T''] defined as follows:
if and only if there exists
and
such that
. This is to say that ''T'' transduces a string
into a string
if there exists a path from an initial state to a final state whose input label is ''x'' and whose output label is ''y''.
Operations on finite state transducers
The following operations defined on finite automata also apply to finite transducers:
★
Union. Given transducers ''T'' and ''S'', there exists a transducer
such that
if and only if
or
.
★ Concatenation. Given transducers ''T'' and ''S'', there exists a transducer
such that
if and only if
and
.
★
Kleene closure. Given a transducer ''T'', there exists a transducer
with the following properties: (1)
; (2) if
and
then
; and
does not hold unless mandated by (1) or (2).
Note that there is no notion of
intersection of transducers. Instead, there is an operation of ''composition'' that is specific to transducers and whose construction is similar to that of intersection of automata. Composition is defined as follows:
★ Given a transducer ''T'' on alphabets Σ and Γ and a transducer ''S'' on alphabets Γ and Δ, there exists a transducer
on Σ and Δ such that
if and only if there exists a string
such that
and
.
One can also project out either tape of a transducer to obtain an automaton. There are two projection functions:
preserves the input tape, and
preserves the output tape. The first projection,
is defined as follows:
★ Given a transducer ''T'', there exists a finite automaton
such that
accepts ''x'' if and only if there exists a string ''y'' for which
.
The second projection,
is defined similarly.
Additional properties of finite state transducers
★ It is
decidable whether the relation [''T''] of a transducer ''T'' is empty.
★ It is decidable whether there exists a string ''y'' such that ''x''[''T'']''y'' for a given string ''x''.
★ It is
undecidable whether two transducers are equivalent.
★ If one defines the alphabet of labels
, finite state transducers are isomorphic to
NDFA over the alphabet
, and may therefore be determinized (turned into
deterministic finite state machines over the alphabet
) and subsequently minimized so that they have the minimum number of states.
See also
★
Mealy machine
Further reading
Speech and Language Processing, , Daniel, Jurafsky, Prentice Hall, 2000, ISBN 0-13-095069-6
Finite-state language processing, , Emmanuel, Roche, MIT Press, 1997, ISBN 0-262-18182-7