Rhetorical Structure Theory (RST: Mann & Thompson 1987, Mann et al. 1993) is a theory of discourse structure used widely through the text generation community, and the wider discourse community. RST analyses a text in terms of a dependency tree, with each node of the tree being a segment of text. Each branch of the tree represents the relationship between a node (a nucleus) and an item of text whose occurrence is dependent on that text (the satellite). Figure 1 show the RST analysis of a short text (from Mann & Thompson 1987).
Performing an RST analysis of a text by hand is a messy business. One starts to draw lines between bits of text, changes ones mind, crosses out the lines, draws new lines, and before long, one has an unreadable mess. Putting the whole process on the computer can simplify the process, making the analysis quicker, and allowing analyses to be altered without too much mess.
In my present project, Ilex,, we need to RST-analyse a substantial body of text, for use in a generation system which intermixes generated and (annotated) canned text. The analysis also needs to be entered into the computer in a machine-readable form. Rather than using text-based entry methods, we have developed a graphical interface to facilitate the analysis and markup of RST structure. This paper describes this tool, which we call RST-Tool.
Figure 1: RST Analysis of a Short Text
Using this tool, one simply drags the mouse between segments of text to establish a relation between segments, and is then offered a list of labels to apply to that relation. Complex text structures can thus be analysed quickly. The interface is easy and intuitive to use.
Another application of the tool revolves around its use in variable-length document presentation -- on-line documents whose length can be adjusted to the user's demands. Text marked-up using the RST-Tool can then be presented on the web, by a program which knows how to summarise this text on the basis of its RST-structure. See my paper in this volume for details.
The tool is used in two stages. Firstly, the user marks the segment boundaries throughout the text (see section 2). Secondly, the user graphically links these segments together into an RST-tree (see section 3). Each of these tasks has a separate interface within the tool.
The RST-Tool is written in Tcl-Tk, and is freely available for Unix, Mac and PC platforms. See http://www.dai.ed.ac.uk/staff/personal_pages/micko/RSTTool/ for details.