| dc.contributor.author | Laukaitis, Algirdas | |
| dc.contributor.author | Plikynas, Darius | |
| dc.contributor.author | Ostašius, Egidijus | |
| dc.date.accessioned | 2023-09-18T17:39:11Z | |
| dc.date.available | 2023-09-18T17:39:11Z | |
| dc.date.issued | 2018 | |
| dc.identifier.issn | 0868-4952 | |
| dc.identifier.uri | https://etalpykla.vilniustech.lt/handle/123456789/124800 | |
| dc.description.abstract | In this paper, we propose a framework for extracting translation memory from a corpus of fiction and non-fiction books. In recent years, there have been several proposals to align bilingual corpus and extract translation memory from legal and technical documents. Yet, when it comes to an alignment of the corpus of translated fiction and non-fiction books, the existing alignment algorithms give low precision results. In order to solve this low precision problem, we propose a new method that incorporates existing alignment algorithms with proactive learning approach. We define several feature functions that are used to build two classifiers for text filtering and alignment. We report results on English-Lithuanian language pair and on bilingual corpus from 200 books. We demonstrate a significant improvement in alignment accuracy over currently available alignment systems. | eng |
| dc.format | PDF | |
| dc.format.extent | p. 693-710 | |
| dc.format.medium | tekstas / txt | |
| dc.language.iso | eng | |
| dc.relation.isreferencedby | MatSciNet | |
| dc.relation.isreferencedby | Zentralblatt MATH (zbMATH) | |
| dc.relation.isreferencedby | Scopus | |
| dc.relation.isreferencedby | Science Citation Index Expanded (Web of Science) | |
| dc.source.uri | https://www.mii.lt/informatica/pdf/INFO1200.pdf | |
| dc.title | Sentence level alignment of digitized books parallel corpora | |
| dc.type | Straipsnis Web of Science DB / Article in Web of Science DB | |
| dcterms.references | 20 | |
| dc.type.pubtype | S1 - Straipsnis Web of Science DB / Web of Science DB article | |
| dc.contributor.institution | Vilniaus Gedimino technikos universitetas | |
| dc.contributor.institution | Vilniaus universitetas Vilniaus Gedimino technikos universitetas | |
| dc.contributor.faculty | Fundamentinių mokslų fakultetas / Faculty of Fundamental Sciences | |
| dc.contributor.faculty | Verslo vadybos fakultetas / Faculty of Business Management | |
| dc.subject.researchfield | T 007 - Informatikos inžinerija / Informatics engineering | |
| dc.subject.en | alignment of corpora | |
| dc.subject.en | alignment of digitized books | |
| dc.subject.en | machine translation | |
| dc.subject.en | natural language processing | |
| dcterms.sourcetitle | Informatica | |
| dc.description.issue | no 4 | |
| dc.description.volume | vol. 29 | |
| dc.publisher.name | Vilniaus universitetas Matematikos ir informatikos institutas | |
| dc.publisher.city | Vilnius | |
| dc.identifier.doi | 10.15388/Informatica.2018.188 | |
| dc.identifier.elaba | 32718030 | |