Departamento de
Traducción e Interpretación


Tema:   Automática.
Autor:   Williams, Philip James
Año:   2014
Título:   Unification-based constraints for statistical machine translation
Lugar:   Edinburgh
Editorial/Revista:   University of Edinburgh
Páginas:   167
Idioma:   Inglés.
Tipo:   Tesis.
Disponibilidad:   Acceso abierto.
Índice:   1. Statistical machine translation models; 2. String-to-tree translation; 3. Unification-based approaches to grammar; 4. Framework; 5. Baseline setup; 6. Agreement and government; 7. Verbal complex production; 8. Improving verbal complex production.
Resumen:   Morphology and syntax have both received attention in statistical machine translation research, but they are usually treated independently and the historical emphasis on translation into English has meant that many morphosyntactic issues remain under-researched. Languages with richer morphologies pose additional problems and conventional approaches tend to perform poorly when either source or target language has rich morphology. In both computational and theoretical linguistics, feature structures together with the associated operation of unification have proven a powerful tool for modelling many morphosyntactic aspects of natural language. In this thesis, we propose a framework that extends a state-of-the-art syntax-based model with a feature structure lexicon and unification-based constraints on the target-side of the synchronous grammar. Whilst our framework is language-independent, we focus on problems in the translation of English to German, a language pair that has a high degree of syntactic reordering and rich target-side morphology. We first apply our approach to modelling agreement and case government phenomena. We use the lexicon to link surface form words with grammatical feature values, such as case, gender, and number, and we use constraints to enforce feature value identity for the words in agreement and government relations. We demonstrate improvements in translation quality of up to 0.5 BLEU over a strong baseline model. We then examine verbal complex production, another aspect of translation that requires the coordination of linguistic features over multiple words, often with long-range discontinuities. We develop a feature structure representation of verbal complex types, using constraint failure as an indicator of translation error and use this to automatically identify and quantify errors that occur in our baseline system. A manual analysis and classification of errors informs an extended version of the model that incorporates information derived from a parse of the source. We identify clause spans and use model features to encourage the generation of complete verbal complex types. We are able to improve accuracy as measured using precision and recall against values extracted from the reference test sets. Our framework allows for the incorporation of rich linguistic information and we present sketches of further applications that could be explored in future work. [Source: Author]
Agradecimientos:   Record supplied by the Departament de Traducció i Interpretació i Estudis de l'Àsia Oriental (Universitat Autònoma de Barcelona).
2001-2019 Universidad de Alicante DOI: 10.14198/bitra
Comentarios o sugerencias
La versión española de esta página es obra de Javier Franco
Nueva búsqueda
European Society for Translation Studies Ministerio de Educación Ivitra : Institut Virtual Internacional de Traducció asociación ibérica de estudios de traducción e interpretación