BITRA. BIBLIOGRAFÍA DE INTERPRETACIÓN Y TRADUCCIÓN
Volver

Tema:


Automática.

Autor:


Malik, Muhammad Ghulam Abbas

Año:


2010

Título:


Méthodes et outils pour les problèmes faibles de traduction [Methods and Tools for Weak Problems of Translation]

Lugar:


Grenoble
https://tel.archivesouvertes.fr/tel00502192/

Editorial/Revista:


Université de Grenoble

Páginas:


262

Idioma:


Francés.

Tipo:


Tesis.

Disponibilidad:


Acceso abierto

Resumen:


Given a source language L1 and a target language L2, a written translation unit S in L1 of n words may have an exponential number N=O(kn)) number of valid translations T1. . . TN. We are interested in the case where N is very small because of the proximity of the written forms of L1 and L2. Our domain of investigation is the class of pairs of language and writing system combinations (LiWi, LjWj) such that there may be only one or a very small number of valid translations for any given S of Li written in Wi. The problem of translating a Hindi/Urdu sentence written in Urdu into an equivalent one in Devanagari falls in this class. We call the problem of translation for such a pair a weak translation problem. We have designed and experimented methods of increasing complexity for solving instances of this problem, from simple finitestate transduction to the transformation of charts of partial syntax trees, with or without the inclusion of empirical (mainly probabilistic) methods. That leads to the identification of the translation difficulty of a (LiWi, LjWj) pair as the degree of complexity of the translation methods achieving a desired goal (such as less than 15% error rate). Considering transliteration or transcription as a special case of translation, we have developed a method based on the definition of a universal intermediate transcription (UIT) for given groups of LiWi couples and used UIT as a phoneticographemic pivot. For handling interdialectal translation into languages with rich flexional morphology, we propose to perform a limited ondemand surface analysis into partial syntax trees and to use it to update and propagate features such as gender and number and to handle word boundary phenomena. Beside largescale experiments, this work has led to the production of linguistic resources such as parallel and tagged corpora and of running systems, all freely available on the Web. They include monolingual corpora, lexicons, morphological analyzers with limited vocabulary, phrase structure grammars of Hindi, Punjabi and Urdu, online webservices for transliteration between Hindi & Urdu, Punjabi (Shahmukhi) & Punjabi (Gurmukhi), etc. An interesting perspective is to apply our techniques to distant LW pairs, for which they could efficiently produce active learning presentations in the form of multiple pidgin outputs. [Source: Author]


