next up previous
Next: Summary Up: From Corpus to Codings: Previous: Post-Editing of Codings

Exporting the Data for Statistical Analysis

Coding is generally used as a first step in statistical analysis. The Coder has been designed as a module in this process. Consequently, the Coder can export the codings in a form readable by a statistical processor.

At present, tab-delimited format is supported. The user can also select which of the features are to be exported, rather than exporting all the data. In our NSF-funded register study, the exported codings are imported into the Microsoft Excel package, or into a statistical package called Statview.

Once in a statistical package, codings can be treated in two ways:

  1. unaggregated: the codings are used as is, each coding representing one case;

  2. aggregated: the codings are separated into text-units (e.g., by newspaper article), and feature values averaged out over the text-unit. The statistical data thus consists of one case per text-unit.

In the unaggregated approach, I include features for the text-type in the coding (e.g. editorial=0/1). We can then statistically analyse the relationship between these text-type features, and the other linguistic features. I typically perform the following analyses:

The aggregated approach offers better data for cluster analysis techniques: such techniques provide groupings of the texts, which we can then interpret as statistically-derived text-types. It remains to the analyst to label the text-types. Unfortunately, the aggregated approach requires far more coding, since we need to code a significant number of texts (rather than a significant number of, say, clauses).



next up previous
Next: Summary Up: From Corpus to Codings: Previous: Post-Editing of Codings



Mick O'Donnell
Thu Jan 25 17:20:03 GMT 1996