Tema:   Automática.
Autor:   Niessen, Sonja
Año:   2002
Título:   Improving statistical machine translation using morpho-syntactic information
Lugar:   Aachen
Editorial/Revista:   Rheinisch-Westfälischen Technischen Hochschule Aachen
Páginas:   123
Idioma:   Inglés.
Tipo:   Tesis.
ISBN/ISSN/DOI:   ISBN: 967608864.
Disponibilidad:   Acceso abierto.
Resumen:   Large differences in word order between corresponding sentences are difficult to capture for automatic alignment algorithms. In this work, a range of sentence level restructuring transformations is introduced, which are motivated by knowledge about the sentence structure in the involved languages. These transformations aim at the assimilation of word orders in related sentences. A detailed analysis of the effect on the corpora and the translation quality reveals that their application results in better alignments and as a consequence in less noisy probabilistic lexica, broader applicability of multi-word phrase pairs and a better coverage of the language model.
Existing statistical systems for machine translation often treat different inflected forms of the same lemma as if they were independent of each other. A better exploitation of the bilingual training data can be achieved by explicitly taking into account the interdependencies of the related inflected forms. In this work a hierarchy of equivalence classes is defined on the basis of morphological and syntactic information beyond the surface forms. Features from those hierarchy levels are combined to form hierarchical lexicon models which can replace the standard probabilistic lexicon used in most statistical machine translation systems. The benefit from these combined models is twofold: Firstly, the lexical coverage is improved, because the translation of unseen word forms can be derived by considering information from lower levels in the hierarchy. Secondly, category ambiguity can be resolved, because syntactical context information is made locally accessible by means of annotation with morpho-syntactic tags. [Source: Author]
