Departamento de
Traducción e Interpretación


Tema:   Automática.
Autor:   Sánchez Martínez, Felipe
Año:   2008
Título:   Using unsupervised corpus-based methods to build rule-based machine translation systems
Lugar:   Alicante
Editorial/Revista:   Universidad de Alicante
Páginas:   170
Idioma:   Inglés.
Tipo:   Tesis.
ISBN/ISSN/DOI:   ISBN: 9788469273647.
Disponibilidad:   Acceso abierto.
Índice:   1. Part-of-speech tagging for machine translation; 2. Pruning of disambiguation paths; 3. Part-of-speech tag clustering; 4. Automatic inference of transfer rules.
Resumen:   During the last years, corpus-based approaches to machine translation (MT), such as statistical MT or example-based MT have grown in interest as a consequence of the increasing availability of bilingual texts in electronic format. However, corpus-based approaches are not applicable when the translation involves less-resourced language pairs for which there are no parallel corpora available, or the size of such corpora is not large enough to build a general-purpose MT system; in those cases, the rule-based approach is the only applicable solution. This is currently the case of less-resourced language pairs such as Occitan–Catalan, French–Catalan or English–Afrikaans, among others. [...] This thesis focuses on the development of unsupervised methods to obtain automat- ically from corpora some of the linguistic resources required to build RBMT systems; more precisely, shallow-transfer MT systems like those in whose development I have been involved. Specifically, this thesis focuses on: (i) an unsupervised method to train part-of-speech (PoS) taggers to be used in RBMT; (ii) the automatic inference of the set of states to be used by PoS taggers based on hidden Markov models for use in RBMT; and, (iii) the automatic inference of shallow-transfer rules from a small amount of par- allel corpora. The final goal is to reduce as much as possible the human effort needed to build a RBMT system from scratch. The approaches that will be discussed in this thesis will show that to (unsupervisedly) train PoS taggers based on hidden Markov models (HMM) there is a source of knowledge, namely, a statistical model of the target language, that can be easily used to produce PoS taggers specially suited for use in RBMT. In addition, it will show how to apply a clustering algorithm to automatically determine the set of hidden states to be used by HMM-based PoS taggers. Finally, this thesis will demonstrate that shallow structural transfer rules can be inferred from a small amount of parallel corpora by using alignment templates like those used in statistical MT. All the approaches and methods that will be discussed in this thesis have been implemented and released as open source in order to allow the whole community to benefit from them; moreover, they have been implemented as tools for the develop- ment of new language pairs for Apertium. The public availability of the source code guarantees the reproducibility of all the experiments conducted. It also allows other researchers to improve them and saves the time and effort of people developing new language pairs for Apertium. [Source: Author]
Agradecimientos:   Record supplied by Departament de Traducció i Interpretació i Estudis de l'Àsia Oriental (Universitat Autònoma de Barcelona).
2001-2019 Universidad de Alicante DOI: 10.14198/bitra
Comentarios o sugerencias
La versión española de esta página es obra de Javier Franco
Nueva búsqueda
European Society for Translation Studies Ministerio de Educación Ivitra : Institut Virtual Internacional de Traducció asociación ibérica de estudios de traducción e interpretación