A key requirement for the use of any stochastic search approach is the ability to assess the quality of a possible solution. Thus we are forced to confront directly the task of evaluating RST trees.
We assign a candidate tree a score which is the sum of scores for particular features the tree may have. A positive score here indicates a good feature and a negative one indicates a bad one.
We cannot make any claims to have the best way of evaluating RS trees. The problem is far too complex and our knowledge of the issues involved so meagre that only a token gesture can be made at this point. We offer the following evaluation scheme merely so that the basis of our experiments is clear and because we believe that some of the ideas are starting in the right direction. Here are the features that we score for:
We assume that the entity that the text is ``about'' is specified with the input. It is highly desirable that the ``top nucleus'' (most important nucleus) of the text be about this entity. Also we prefer texts that use interesting relations. We score as follows:
-10 for a top nucleus not mentioning the subject of the text
-30 for a joint relation
+21 for a relation other than joint and elaboration
Scott and de Souza [Scott and de Souza 90] say that the greater the amount of intervening text between the propositions of a relation, the more difficult it will be to reconstruct its message.
We score as follows:
-4 for each fact that will come textually between a satellite and its nucleus
Our relations have preconditions which are facts that should be conveyed before them.
We score as follows:
-20 for an unsatisfied precondition for a relation
We do not have a complex model of focus development through the text, though development of such a model would be worthwhile. As McKeown and others have done, we prefer certain transitions over others. If consecutive facts mention the same entities or verb, the prospects for aggregation are greater, and this is usually desirable.
We score as follows:
-9 for a fact (apart from the first) not
mentioning any previously mentioned entity
-3 for a fact not mentioning any entity in the previous fact, but
whose subject is a previously mentioned entity
+3 for a fact retaining the subject of the last fact as its subject
+3 for a fact using the same verb as the previous one
When an entity is first introduced as the subject of a fact, it is usual for that to be a very general statement about the entity. Preferring this introduces a mild schema-like influence to the system.
We score as follows:
+3 for the first fact with a given entity as subject having verb ``is''