• Lietuvių
    • English
  • English 
    • Lietuvių
    • English
  • Login
View Item 
  •   DSpace Home
  • Mokslinės publikacijos (PDB) / Scientific publications (PDB)
  • Konferencijų publikacijos / Conference Publications
  • Konferencijų straipsniai / Conference Articles
  • View Item
  •   DSpace Home
  • Mokslinės publikacijos (PDB) / Scientific publications (PDB)
  • Konferencijų publikacijos / Conference Publications
  • Konferencijų straipsniai / Conference Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Towards automatic structured web data extraction system

Thumbnail
Date
2012
Author
Grigalis, Tomas
Metadata
Show full item record
Abstract
Automatic extraction of structured data from web pages is one of the key challenges for theWeb search engines to advance into the more expressive semantic level. Here we propose a novel data extraction method, called ClustVX. It exploits visual as well as structural features of web page elements to group them into semantically similar clusters. Resulting clusters reflect the page structure and are used to derive data extraction rules. The preliminary evaluation results of ClustVX system on three public benchmark datasets demonstrate a high efficiency and indicate a need for a much bigger up-to-date benchmark data set that reflects contemporary WEB 2.0 web pages.
Issue date (year)
2012
URI
https://etalpykla.vilniustech.lt/handle/123456789/143223
Collections
  • Konferencijų straipsniai / Conference Articles [15192]

 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjects / KeywordsInstitutionFacultyDepartment / InstituteTypeSourcePublisherType (PDB/ETD)Research fieldStudy directionVILNIUS TECH research priorities and topicsLithuanian intelligent specializationThis CollectionBy Issue DateAuthorsTitlesSubjects / KeywordsInstitutionFacultyDepartment / InstituteTypeSourcePublisherType (PDB/ETD)Research fieldStudy directionVILNIUS TECH research priorities and topicsLithuanian intelligent specialization

My Account

LoginRegister