Show simple item record

dc.contributor.authorGrigalis, Tomas
dc.contributor.authorČenys, Antanas
dc.date.accessioned2023-09-18T19:15:38Z
dc.date.available2023-09-18T19:15:38Z
dc.date.issued2012
dc.identifier.other(BIS)VGT02-000025148
dc.identifier.urihttps://etalpykla.vilniustech.lt/handle/123456789/137400
dc.description.abstractRecord segmentation is a core problem in structured web data extraction. In this paper we present a novel technique that segments structured web data into individual data records that come from underlying database. Proposed technique exploits visual as well as structural features of web page elements to group them into semantically similar clusters. Resulting clusters reflect the page structure and are used to segment data records. During the segmentation process the technique also generates Xpath expressions. These expressions can be later used to directly extract data records from same template generated web pages without need to redo all the clustering and segmentation processes. Extracted structured data can be reused in wide range of applications, such as price comparison portals, meta-searching, knowledge bases and etc. The experimental evaluation results of proposed technique system on three publicly available benchmark data sets demonstrate nearly perfect results in terms of precision and recall.eng
dc.formatPDF
dc.format.extentp. 38-47
dc.format.mediumtekstas / txt
dc.language.isoeng
dc.relation.ispartofseriesCommunications in Computer and Information Science vol. 319 1865-0929 1865-0937
dc.relation.isreferencedbyConference Proceedings Citation Index - Science (Web of Science)
dc.relation.isreferencedbySpringerLink
dc.relation.isreferencedbyScopus
dc.relation.isreferencedbyMathSciNet
dc.source.urihttps://doi.org/10.1007/978-3-642-33308-8_4
dc.titleGenerating Xpath expressions for structured web data record segmentation
dc.typeStraipsnis konferencijos darbų leidinyje Web of Science DB / Paper in conference publication in Web of Science DB
dcterms.references18
dc.type.pubtypeP1a - Straipsnis konferencijos darbų leidinyje Web of Science DB / Article in conference proceedings Web of Science DB
dc.contributor.institutionVilniaus Gedimino technikos universitetas
dc.contributor.facultyFundamentinių mokslų fakultetas / Faculty of Fundamental Sciences
dc.contributor.departmentInformacinių sistemų katedra / Department of Information Systems
dc.subject.researchfieldT 007 - Informatikos inžinerija / Informatics engineering
dc.subject.enWeb data segmentation
dc.subject.enStructured web data
dc.subject.enWeb data extraction
dc.subject.enWrapper induction
dcterms.sourcetitleInformation and software technologies : 18th International Conference, ICIST 2012, Kaunas, Lithuania, September 13-14, 2012 : proceedings
dc.publisher.nameSpringer
dc.publisher.cityNew York
dc.identifier.doi000312463800004
dc.identifier.doi10.1007/978-3-642-33308-8_4
dc.identifier.elaba3994295


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record