• Lietuvių
    • English
  • English 
    • Lietuvių
    • English
  • Login
View Item 
  •   DSpace Home
  • Mokslinės publikacijos (PDB) / Scientific publications (PDB)
  • Moksliniai ir apžvalginiai straipsniai / Research and Review Articles
  • Straipsniai Web of Science ir/ar Scopus referuojamuose leidiniuose / Articles in Web of Science and/or Scopus indexed sources
  • View Item
  •   DSpace Home
  • Mokslinės publikacijos (PDB) / Scientific publications (PDB)
  • Moksliniai ir apžvalginiai straipsniai / Research and Review Articles
  • Straipsniai Web of Science ir/ar Scopus referuojamuose leidiniuose / Articles in Web of Science and/or Scopus indexed sources
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Sentence level alignment of digitized books parallel corpora

Thumbnail
Date
2018
Author
Laukaitis, Algirdas
Plikynas, Darius
Ostašius, Egidijus
Metadata
Show full item record
Abstract
In this paper, we propose a framework for extracting translation memory from a corpus of fiction and non-fiction books. In recent years, there have been several proposals to align bilingual corpus and extract translation memory from legal and technical documents. Yet, when it comes to an alignment of the corpus of translated fiction and non-fiction books, the existing alignment algorithms give low precision results. In order to solve this low precision problem, we propose a new method that incorporates existing alignment algorithms with proactive learning approach. We define several feature functions that are used to build two classifiers for text filtering and alignment. We report results on English-Lithuanian language pair and on bilingual corpus from 200 books. We demonstrate a significant improvement in alignment accuracy over currently available alignment systems.
Issue date (year)
2018
URI
https://etalpykla.vilniustech.lt/handle/123456789/124800
Collections
  • Straipsniai Web of Science ir/ar Scopus referuojamuose leidiniuose / Articles in Web of Science and/or Scopus indexed sources [7946]

 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjects / KeywordsInstitutionFacultyDepartment / InstituteTypeSourcePublisherType (PDB/ETD)Research fieldStudy directionVILNIUS TECH research priorities and topicsLithuanian intelligent specializationThis CollectionBy Issue DateAuthorsTitlesSubjects / KeywordsInstitutionFacultyDepartment / InstituteTypeSourcePublisherType (PDB/ETD)Research fieldStudy directionVILNIUS TECH research priorities and topicsLithuanian intelligent specialization

My Account

LoginRegister