Induction of rules from parallel corpus

Zaremba, Mindaugas; Laukaitis, Algirdas; Vasilecas, Olegas

Date

2008

Author

Zaremba, Mindaugas

Laukaitis, Algirdas

Vasilecas, Olegas

Metadata

Show full item record

Abstract

This paper considers approaches for translation between English and morphology-rich languages. We consider all Web-available linguistic resources for this task and integrate them in one comprehensive statistical model. Syntax parsers, bilingual and semantic dictionaries, bilingual parallel corpus and monolingual Web-based corpus are taken into account. Multi-abstraction language representation is used for statistical induction of syntactic and semantic transformation rules called multi-alignment templates. The decoding model is described using the future functions and a log-linear modeling approach. An evaluation of this approach is performed on the Lithuanian-English language pair. Presented experimental results demonstrates that the multiabstraction approach and hybridization of learning methods can improve translation quality. All resources presented in this paper are available at www.vvam.lt.

Issue date (year)

2008

URI

https://etalpykla.vilniustech.lt/handle/123456789/119625

Collections

Konferencijų straipsniai / Conference Articles [15192]