Principal
Investigators (UAM): Mick O'Donnell, Susana
Murcia
Becarios: Ismael Pascual Nieto, Irene Eleta, Jose Maria Martinez, Cesar Dante, Fernando Maquedano Project DescriptionThis project was funded by the Ministerio de Industria, Turismo y Comercio from January 2006 until March 2007. The project, under the PROFIT call, was a cooperation between Seinet , Eurotech and UAM to increase the suitability of Seinet's content management system, Xtent, for working with technical publications, such as those used in Eurotech. The role of UAM was to integrate automatic translation software into Xtent.However,
one of our first results was the discovery that none of the available
MT systems were suitable for this purpose: they either did not function
in server mode (a requirement of the project), or if they did, their
cost was well outside the budget.
For this reason, we re-directed our component of the project towards developing our own MT system. Our eventual goal is to build a complete English-Spanish MT system for technical documentation. During the scope of the project, only some of the necessary tasks were completed. During the time the project ran, we developed software to:
Future WorkWe are currently extending the system. Firstly, we are building a GUI which will be available to researchers working with parallel corpora, for use in dictionary creation, sentence alignment and word alignment. Secondly, we have developed software for Named Entity Recognition, which needs to e incorporated into the current system (such that named entities are aligned as a phrase, not as single words). Our eventual goal is to produce sentence patterns by extracting out the NPs and adverbs from sentences, and normalising for tense/aspect/number. Our system will thus produce a Translation Memory in terms of a set of source language sentence patterns, and the corresponding sentence patterns in the target language. These sentence patterns will then form a resource for automatic translation of new texts.Publications
|