• Lietuvių
    • English
  • English 
    • Lietuvių
    • English
  • Login
View Item 
  •   DSpace Home
  • Mokslinės publikacijos (PDB) / Scientific publications (PDB)
  • Konferencijų publikacijos / Conference Publications
  • Konferencijų straipsniai / Conference Articles
  • View Item
  •   DSpace Home
  • Mokslinės publikacijos (PDB) / Scientific publications (PDB)
  • Konferencijų publikacijos / Conference Publications
  • Konferencijų straipsniai / Conference Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Generating Xpath expressions for structured web data record segmentation

Thumbnail
Date
2012
Author
Grigalis, Tomas
Čenys, Antanas
Metadata
Show full item record
Abstract
Record segmentation is a core problem in structured web data extraction. In this paper we present a novel technique that segments structured web data into individual data records that come from underlying database. Proposed technique exploits visual as well as structural features of web page elements to group them into semantically similar clusters. Resulting clusters reflect the page structure and are used to segment data records. During the segmentation process the technique also generates Xpath expressions. These expressions can be later used to directly extract data records from same template generated web pages without need to redo all the clustering and segmentation processes. Extracted structured data can be reused in wide range of applications, such as price comparison portals, meta-searching, knowledge bases and etc. The experimental evaluation results of proposed technique system on three publicly available benchmark data sets demonstrate nearly perfect results in terms of precision and recall.
Issue date (year)
2012
URI
https://etalpykla.vilniustech.lt/handle/123456789/137400
Collections
  • Konferencijų straipsniai / Conference Articles [15192]

 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjects / KeywordsInstitutionFacultyDepartment / InstituteTypeSourcePublisherType (PDB/ETD)Research fieldStudy directionVILNIUS TECH research priorities and topicsLithuanian intelligent specializationThis CollectionBy Issue DateAuthorsTitlesSubjects / KeywordsInstitutionFacultyDepartment / InstituteTypeSourcePublisherType (PDB/ETD)Research fieldStudy directionVILNIUS TECH research priorities and topicsLithuanian intelligent specialization

My Account

LoginRegister