Departamento de
Traducción e Interpretación


Tema:   Automática.
Autor:   Yahyaei, Mohammad Sirvan
Año:   2012
Título:   Reordering in statistical machine translation
Lugar:   London
Editorial/Revista:   Queen Mary University of London
Páginas:   163
Idioma:   Inglés.
Tipo:   Tesis.
Disponibilidad:   Acceso abierto.
Índice:   1. Statistical machine translation; 2. Reordering in statistical machine translation; 3. Decoding by dinamic chunking; 4. Dynamic distortion in a discriminative reordering model; 5. Evaluation of NER on SMT output; 6. Cross-lingual fragment alignment using DFR.
Resumen:   The main focus of this work is on reordering as one of the major problems in MT and statistical MT, which is the method investigated in this research. The reordering problem in SMT originates from the fact that not all the words in a sentence can be consecutively translated. This means words must be skipped and be translated out of their order in the source sentence to produce a fluent and grammatically correct sentence in the target language. The main reason that reordering is needed is the fundamental word order differences between languages. Therefore, reordering becomes a more dominant issue, the more source and target languages are structurally different. The aim of this thesis is to study the reordering phenomenon by proposing new methods of dealing with reordering in SMT decoders and evaluating the effectiveness of the methods and the importance of reordering in the context of natural language processing tasks. In other words, we propose novel ways of performing the decoding to improve the reordering capabilities of the SMT decoder and in addition we explore the effect of improving the reordering on the quality of specific NLP tasks, namely named entity recognition and cross-lingual text association. Meanwhile, we go beyond reordering in text association and present a method to perform cross-lingual text fragment alignment, based on models of divergence from randomness. The main contribution of this thesis is a novel method named dynamic distortion, which is designed to improve the ability of the phrase-based decoder in performing reordering by adjusting the distortion parameter based on the translation context. The model employs a discriminative reordering model, which is combining several features including lexical and syntactic, to predict the necessary distortion limit for each sentence and each hypothesis expansion. The discriminative reordering model is also integrated into the decoder as an extra feature. The method achieves substantial improvements over the baseline without increase in the decoding time by avoiding reordering in unnecessary positions. [Source: Author]
