Departamento de
Traducción e Interpretación


Tema:   Automática. Corpus.
Autor:   Dugast, Loic
Año:   2013
Título:   Introducing corpus-based rules and algorithms in a rule-based machine translation system
Lugar:   Edinburgh
Editorial/Revista:   University of Edinburgh
Páginas:   91
Idioma:   Inglés.
Tipo:   Tesis.
Disponibilidad:   Acceso abierto.
Índice:   1. Comparison of machine translation paradigms; 2. Automatic lexical rule acquisition for rule-based systems; 3. An integrated hybrid system.
Resumen:   Machine translation offers the challenge of automatically translating a text from one natural language into another. Statistical methods - originating from the field of information theory - have shown to be a major breakthrough in the field of machine translation. Prior to this paradigm, many systems had been developed following a rule-based approach. This denotes a system based on a linguistic description of the languages involved and of how translation occurs in the mind of the (human) translator. Statistical models on the contrary use empirical means and may work with very little linguistic hypothesis on language and translation as performed by humans. This had implications for rule-based translation systems, in terms of software architecture and the nature of the rules, which were manually input and lack any statistical feature. In the view of such diverging paradigms, we can imagine trying to combine both in a hybrid system. In the present work, we start by examining the state-of-the-art of both rule-based and statistical systems. We restrict the rule-based approach to transfer-based systems. We compare rule-based and statistical paradigms in terms of global translation quality and give a qualitative analysis of their respective specific errors. We also introduce initial black-box hybrid models that confirm there is an expected gain in combining the two approaches. Motivated by the qualitative analysis, we focus our study and experiments on lexical phrasal rules. We propose a setup allowing to extract such resources from corpora. Going one step further in the integration of rule-based and statistical approaches, we then examine how to combine the extracted rules with decoding modules that will allow for a corpus-based handling of ambiguity. This then leads to the final delivery of this work: a rule-based system for which we can learn non-deterministic rules from corpora, and whose decoder can be optimised on a tuning set in the same domain. [Source: Author]
Agradecimientos:   Record supplied by Departament de Traducció i Interpretació i Estudis de l'Àsia Oriental (Universitat Autònoma de Barcelona).
2001-2021 Universidad de Alicante DOI: 10.14198/bitra
Comentarios o sugerencias
La versión española de esta página es obra de Javier Franco
Nueva búsqueda
European Society for Translation Studies Ministerio de Educación Ivitra : Institut Virtual Internacional de Traducció asociación ibérica de estudios de traducción e interpretación