Coding is generally used as a first step in statistical analysis. The Coder has been designed as a module in this process. Consequently, the Coder can export the codings in a form readable by a statistical processor.
At present, tab-delimited format is supported. The user can also select which of the features are to be exported, rather than exporting all the data. In our NSF-funded register study, the exported codings are imported into the Microsoft Excel package, or into a statistical package called Statview.
Once in a statistical package, codings can be treated in two ways:
In the unaggregated approach, I include features for the text-type in the coding (e.g. editorial=0/1). We can then statistically analyse the relationship between these text-type features, and the other linguistic features. I typically perform the following analyses:
Editorial Non-Editorial Simple-past 15% 42% Simple-present 33% 13% Simple-Future 8% 3%
The aggregated approach offers better data for cluster analysis techniques: such techniques provide groupings of the texts, which we can then interpret as statistically-derived text-types. It remains to the analyst to label the text-types. Unfortunately, the aggregated approach requires far more coding, since we need to code a significant number of texts (rather than a significant number of, say, clauses).