• Lietuvių
    • English
  • English 
    • Lietuvių
    • English
  • Login
View Item 
  •   DSpace Home
  • Mokslinės publikacijos (PDB) / Scientific publications (PDB)
  • Konferencijų publikacijos / Conference Publications
  • Konferencijų straipsniai / Conference Articles
  • View Item
  •   DSpace Home
  • Mokslinės publikacijos (PDB) / Scientific publications (PDB)
  • Konferencijų publikacijos / Conference Publications
  • Konferencijų straipsniai / Conference Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Towards web-scale structured web data extraction

Thumbnail
Date
2013
Author
Grigalis, Tomas
Metadata
Show full item record
Abstract
In this paper we present an ongoing PhD research on unsupervised and domain-independent structured data extraction from the Web. We propose a novel method to extract structured data records from template-generated Web pages. The method is based on clustering visually similar Web page elements by exploiting their visual formatting and HTML structural features. Tag paths of clustered Web page elements are then employed to derive extraction rules. These rules, called wrappers, can be later reused on thousands of same template-generated Web pages. This opens the possibility for the proposed method to be deployed in Web-Scale structured data extraction systems.
Issue date (year)
2013
URI
https://etalpykla.vilniustech.lt/handle/123456789/143224
Collections
  • Konferencijų straipsniai / Conference Articles [15192]

 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjects / KeywordsInstitutionFacultyDepartment / InstituteTypeSourcePublisherType (PDB/ETD)Research fieldStudy directionVILNIUS TECH research priorities and topicsLithuanian intelligent specializationThis CollectionBy Issue DateAuthorsTitlesSubjects / KeywordsInstitutionFacultyDepartment / InstituteTypeSourcePublisherType (PDB/ETD)Research fieldStudy directionVILNIUS TECH research priorities and topicsLithuanian intelligent specialization

My Account

LoginRegister