Next: Acknowledgements
Up: Reducing Complexity in A
Previous: Improving the Efficiency
While the techniques outlined here have been
applied in ways particular to a systemic grammar,
and for a particular implementation, there are
principles behind the re-representations which are
general to all implementations:
- Avoid DNF-expansion where possible, as in
Kasper's unification algorithm.
- Delay expansion to a later time -- information
gained later may show the description to be
inconsistent in the definite component.
- When expansion is necessary,
- Try to extract out sub-descriptions which can
be used, rather than expanding the entire
grammar.
- Expand out first disjunctions which are most
likely to conflict, since this will reduce the
total number of terms which will need to be
multiplied.
- Avoid expanding terms that can be known to
be incompatible.
As a result of the application of these techniques
(and others not here mentioned), we have been able
to implement a parsing system which parses using a
large systemic grammar.
- We start with the Nigel grammar, as used in
the Penman Generation System, slightly modified
for parsing purposes.
- This grammar is then reduced by applying
register-restrictions, leaving a less complex
grammar, but a grammar which still handles the bulk
of the phenomena in the target texts.
- Sub-descriptions of the grammar tailored
for particular processes are then extracted, and
expanded out as a precompile step, producing a set
of `chunks' which can be used in parsing. This
expansion takes approximately 2 minutes using Sun
Common Lisp on a Sun Sparc II.
- The `chunked' grammar is then used to
parse sentences. On the above-mentioned platform,
parsing a sentence like ``A user-password is a
character string consisting of a maximum of eight
alpha-numeric characters.'' took 35 seconds to
parse. This parser is slow, compared to
most non-systemic parsers, but is far faster than the parser
would be without the methods outlined here.
Future work will attempt to reduce this parsing
time. Four directions are being followed:
- Streamlining the unification process.
- Moving more processing to the pre-compilation stage.
- Reducing the complexity of the description
without reducing its coverage.
- Incorporating heuristics to resolve
ambiguities without full expansion.
Next: Acknowledgements
Up: Reducing Complexity in A
Previous: Improving the Efficiency
Mick O'Donnell
Fri Jan 26 19:21:43 GMT 1996