Show simple item record

dc.contributor.authorLaukaitis, Algirdas
dc.contributor.authorPlikynas, Darius
dc.contributor.authorOstašius, Egidijus
dc.date.accessioned2023-09-18T17:39:11Z
dc.date.available2023-09-18T17:39:11Z
dc.date.issued2018
dc.identifier.issn0868-4952
dc.identifier.urihttps://etalpykla.vilniustech.lt/handle/123456789/124800
dc.description.abstractIn this paper, we propose a framework for extracting translation memory from a corpus of fiction and non-fiction books. In recent years, there have been several proposals to align bilingual corpus and extract translation memory from legal and technical documents. Yet, when it comes to an alignment of the corpus of translated fiction and non-fiction books, the existing alignment algorithms give low precision results. In order to solve this low precision problem, we propose a new method that incorporates existing alignment algorithms with proactive learning approach. We define several feature functions that are used to build two classifiers for text filtering and alignment. We report results on English-Lithuanian language pair and on bilingual corpus from 200 books. We demonstrate a significant improvement in alignment accuracy over currently available alignment systems.eng
dc.formatPDF
dc.format.extentp. 693-710
dc.format.mediumtekstas / txt
dc.language.isoeng
dc.relation.isreferencedbyMatSciNet
dc.relation.isreferencedbyZentralblatt MATH (zbMATH)
dc.relation.isreferencedbyScopus
dc.relation.isreferencedbyScience Citation Index Expanded (Web of Science)
dc.source.urihttps://www.mii.lt/informatica/pdf/INFO1200.pdf
dc.titleSentence level alignment of digitized books parallel corpora
dc.typeStraipsnis Web of Science DB / Article in Web of Science DB
dcterms.references20
dc.type.pubtypeS1 - Straipsnis Web of Science DB / Web of Science DB article
dc.contributor.institutionVilniaus Gedimino technikos universitetas
dc.contributor.institutionVilniaus universitetas Vilniaus Gedimino technikos universitetas
dc.contributor.facultyFundamentinių mokslų fakultetas / Faculty of Fundamental Sciences
dc.contributor.facultyVerslo vadybos fakultetas / Faculty of Business Management
dc.subject.researchfieldT 007 - Informatikos inžinerija / Informatics engineering
dc.subject.enalignment of corpora
dc.subject.enalignment of digitized books
dc.subject.enmachine translation
dc.subject.ennatural language processing
dcterms.sourcetitleInformatica
dc.description.issueno 4
dc.description.volumevol. 29
dc.publisher.nameVilniaus universitetas Matematikos ir informatikos institutas
dc.publisher.cityVilnius
dc.identifier.doi10.15388/Informatica.2018.188
dc.identifier.elaba32718030


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record