The idea of Tabulation is quite general and is present under different forms in many domains of application. The basic principle consists into saving traces of computation in some table. It become then possible for instance to check the repetition of a subcomputation to avoid looping or to reuse the result of a subcomputation in different contexts.
Maybe the most popular example of tabulation is the so called "memoization" technique used in imperative or functional programming, for instance to compute the function Fibonacci.
Computation Sharing provided by tabulating is even more
interesting when dealing with highly non-deterministic computations
where similar subcomputations may be found in the different
alternative. However, techniques to use are also more difficult in
this context, principally because the result of a computation is
usually the least fix point of some operator. For instance, in Logic
Programming, to compute the answers to some (sub)query
q(X)
where q
is a recursive predicate, one
has to propagate the already computed answers until reaching the
fix-point.
Areas of Application
Tabulation techniques may be used to avoid a good deal of infinite looping in Logic Programming. They also provide a good support to implement extended negation, in particular for stratified programs.
They also have been used in Deductive Data Bases, here again for termination and completeness. In the context of Deductive Data Bases, they are generally related to program transformation techniques called Magic Set.
Abstract Interpretation is also a domain where tabulation is useful to compute complete least-fix point of some abstract computation for a program. Moreover, termination is important because Abstract Interpretation is ideally a phase in a compiling process (and compilers don't loop). Computation sharing induced by tabulation is also increased when computing a fix-point over an abstract domain, because this domain is much smaller than the original concrete one.
In computational linguistics, tabulation is needed to face the
inherent ambiguity of language. A simple example of propositional
attachment in sentences may lead to a combinatorial time explosion
without some kind of tabulation (generally ensured by using a
chart parser). Algorithm CKY is maybe the most basic form of
tabular algorithm in parsing where the algorithm fills in a bottom-up
way a 2 dimensional table, indexed by string position and constituent
length. More sophisticated are Earley algorithm and chart parser,
where are tabulated labeled edges between string positions. A tutorial
(in French) presented at TALN99 tries to retrace the history of
tabulation in NLP and to outline the current state of art.
Push-Down Automata
Push-Down Automata [PDA] are well-known devices used in Formal Language Theory to characterize for instance Context-Free Languages. Their main component is a stack, i.e. a last-in first-out data structure. PDAs are actually well-adapted to express the notion of procedure calls and recursion, where one has to suspend temporally the current computation to process a sub-computation.
There are many interesting extensions of the traditional PDAs working on stack of symbols. In particular, I've been using and refining the notion of Logic Push-Down Automata [LPDA] working on stack of first-order terms and have introduced the notion of Subsumption-Oriented PDAs over ordered abstract domain of stacks.
More exotic variations of PDAs such as 2-stack automata [2SA] or Embedded Push-Down Automata [EPDA] are also interesting to deal with Tree Adjoining Grammars and Linear Indexed Grammars.
In my opinion, PDAs (and more generally automata) are very useful to
cleanly model the operational semantic to be associated with a
declarative grammar or logic program.
Dynamic Programming
Dynamic Programming could actually be seen as a synonym for tabulation. This is a principle to design algorithms where one tries to break computation into sub-computations that may be used several times in different contexts. These sub-computations are tabulated in order to save re-computation.
Actually, Dynamic Programming is mostly used to solve optimization problems where sub-computations corresponds to local optimums that may be joined to compute a global one. A good example is the Knapsack problem.
Dynamic Programming may also be used to solve many graph traversal
problems (such as shortest path or path collector between 2 points).
Following derivations of a transition may also be seen as a kind of
graph traversal and submitted to Dynamic Programming
techniques. Actually, the topology of the graph associated to the
derivations of a Push-Down Automata exhibits a lot of regularities, in
other words, a lot of possibilities for computation sharing using
Dynamic Programming techniques. In particular, a PDA derivation may
ignore for a while the bottom part of the control stack.
Subsumption
Subsumption is an important issue when using tabulation to keep traces
of monotonous subcomputations over ordered domains. Indeed, it is then
only necessary to tabulate traces associated to most general
sub-computations, other ones being just instances.
DyALog
DyALog has been developed during the last few years
to show the validity of tabulation for logic programs and grammars.
There is no recent and complete description of DyALog, but one can browse the following papers (in French and English).
The current Release 1.11.0 of DyALog may be downloaded either as a source package or a binary rpm. The current version runs under Linux/i*86 and (possibly) SunOS. The documentation is still largely incomplete and (unfortunately) to be updated (PDF Version). The NEWS retraces the history of DyALog functionnalities.
Typed Feature Structures are available as well as bidirectional parsing for Definite Clause Grammars and Bound Movement Grammars, hilog notation, and some aggregate predicates. Release 1.1 also introduced the compilation of Tree Adjoining Grammars. An XTAG-like architecture for TAGs (with tree families, lemma, lexicons, and (co)anchors) is possible in Release 1.4.2 .
Release 1.5.0 offers the possiblity to declare interfaces to C functions.
Release 1.9.0 includes an experiemental implementation of Range Concatenation Grammars [RCG] extended with logical arguments.
Release 1.10.1 has introduced floats, namespaces, modules, and a toplevel. There is also a new class of more efficient tabulated predicates and support for left-corner parsing strategies for DCG. Support has also been added to start new projects with DyALog.
Release 1.10.2 is a minor release
Release 1.10.3 has introduced hybrid Tree Insertion Grammars (TIGs) and TAGs parsing (with analyze of TAGs to extract the TIG part). Multiple adjunction is possible for TIG left and right adjunctions. This version has also introduced a kleene star operator for DCGs, TAGs and TIGs as well as an interleave operator for DCGs.
Release 1.10.4 is a minor release that fixes a few bugs and improves support for Kleene star and interleave operators.
Papers and presentation slides are available, [TALN02] (in French), [TAPD98] (in English) [ATALA99] (in French).
DyALog may now be tried on line for
differents grammars (TAG and BMG) and parsers.
Indexing
Efficient term searching in the table modulo unification or subsumption are one of the main problems yet to solve in DyALog. A system of tries is currently used where searching is done by following term arguments in a depth-first and left-to-right way (looking terms as trees), and ignoring variable correlations. This scheme does not work well when storing complex terms that only discriminate late on the indexing path. Moreover, this scheme only provide an approximation of searching modulo unification and must be completed by costly unification on each potential candidate. Note that is still better that what provided by most Prolog systems, where indexing is only done on the first predicate argument.
A better approach seems to use substitution trees to
factorize unification along a tree whose leaves are the indexed
terms. Searching is then close from a Prolog process with
backtracking. There is no need for further unification when reaching
leaves. This scheme should be tried in a near future in
DyALog.
Structure Sharing
Reducing the memory cost of tabulating terms is a necessity, hence the
need for some efficient structure-sharing mechanism (in opposition of
copying mechanism found in most Prolog systems). The delicate point to
address is the problem of variable renaming: indeed, tabulated terms
must be considered as without common variable.
DyALog uses an efficient structure-sharing mechanism
[POPL93].
Other Tabular Systems
Besides DyALog, there are different tabular systems:
A seminal reference on Typed Feature Structures [TFS] is Carpenter's book. Based on this book, the system ALE is freely available and runs on top of most Prolog systems.
Given their linguistic importance,
Typed Feature Structures have been added to DyALog [TFS]. Their
integration has also shown that Structure-Sharing (as provided by
DyALog) is a plus for TFS.
Tree Adjoining Grammars
Tree Adjoining Grammars are a very interesting class of linguistically motivated grammars introduced by Joshi. They are natural extensions of Context-Free Grammars and belong also to a larger class of formalisms parsable in polynomial space and time: Mildly Context Sensitive Grammars. They are also equivalent to Linear Indexed Grammars.
Tabular algorithms exists for TAGs and LIGs but are usually designed for a specific parsing strategy and have varying complexities. There also exist different kinds of automata useful for TAGs and LIGs, such as 2-stack automata [2SA] and Embedded Push-Down Automata [EPDA].
In a recent joint work, Miguel Alonso Pardo and I have proposed a restricted class of 2-SA that may be used to describe a wide range of parsing strategies for TAGs and LIGs. A Dynamic Programming interpretation of these 2-SA exists that gives the best known time and space general complexity. A more specialized version has also been proposed for bottom-up strategies that reduce the space complexity (but not the time complexity).
[ACL98]
[TAG+4]
[MOL99]
[EACL99]
[PhD Alonso]
[TAG+5a]
Shared Parse Forest
An very interesting by-product of tabular parsers is the possibility to represent the whole set of parse trees for a sentence by a compact shared parse forest. For instance, such a forest may represent in cubic space an exponential set of parse trees, for a context-free grammar.
A parse forest may be used to share both common top or bottom parts of parse trees. Even more interestingly, they are actually formally equivalent to grammars. This property holds for logic based grammars (and logic programs) with a supplementary characteristic: parse forest may be used to enumerate all parse tree without failure during the enumeration process.
From a practical point of view, parse forest seem to be good candidate for post-parsing processes such as translation, where some ambiguities may cross unchanged over languages while easy identification of others will ease a man-machine dialogue.
DyALog can display parse shared
forest. In the context of DyALog integration in the
GATE platform, I have also added a small parse
forest browser. Still tentative, parse forest may now be manipulated
from within DyALog in order to be able to express
forest transformation (for instance, to build a shared "semantic
forest" from a parse forest).
Collaborations