To perform text studies, we often need to spend significant amounts of time coding our texts -- splitting them up into segments of some size, and assigning features of some kind (discourse, syntactic, etc.) to each segment. We then have the problem of re-representing the coded information in a format which can be used for statistical analysis.
Ideally, some form of automatic coding of the text will be performed, using a tagger, syntactic parser, or semantic analyser. Unfortunately, the scope of such tools is limited (both in terms of syntactic coverage and semantic depth), particularly when discoursal features are being coded.
The alternative to fully automatic coding is semi-automated coding. Over the last few years, I have been developing a software tool to semi-automating some of the processes involved in coding text. The result of this work is called the ``WAG Coder'', which is one module of the Workbench for Analysis and generation (WAG) system -- a system for single-sentence analysis and generation (O`Donnell 1994). The program runs on Macintosh computers.
The WAG Coder uses a menu-driven, window-based interface to maximally simplify the coding task. The user is prompted with a series of linguistic alternatives (choices) from which the user chooses one. Double-clicking on one of the proffered features will record the choice. Further choices will then be presented.
The coder can be set up to code text units at any linguistic level, for instance, graphological status, discoursal features, or sociological variables. However, the user does need to provide the coding scheme, which is a statement of the features to be coded, also stating which of these features are mutually exclusive. The systemic term for a set of mutually exclusive features is a system.
It is useful to avoid coding choices which do not apply to the present unit. For instance, if we are coding an intransitive clause, it doesn't make sense to ask whether the clause is active or passive. By using a systemic network (systems organised into an inheritance network) to represent the relations between features, we avoid this problem. Some choice alternatives (systems) are made dependent on prior features being chosen. Choice sets are thus ordered in dependency.
The WAG Coder was developed under the Electronic Discourse Analyser project, funded by Fujitsu (Japan), and based in Sydney (Matthiessen et al. 1991). Faced with the need for grammatical profiles of our target texts, and lacking analysis tools, we developed the coder to help us build the profile. The Coder was further developed under an NSF-funded project to study the register of Newspaper articles, as part of a wider goal of making the output of a text generation system sensitive to register variation (see Bateman & Paris 1989a, 1989b; Paris & Bateman 1990).