Show simple item record

dc.contributor.authorGrigalis, Tomas
dc.contributor.authorČenys, Antanas
dc.date.accessioned2023-09-18T20:04:02Z
dc.date.available2023-09-18T20:04:02Z
dc.date.issued2014
dc.identifier.issn0948-695X
dc.identifier.other(BIS)VGT02-000028249
dc.identifier.urihttps://etalpykla.vilniustech.lt/handle/123456789/146446
dc.description.abstractThis paper studies structured data extraction from template-generated Web pages. Such pages contain most of structured data on the Web. Extracted structured data can be later integrated and reused in very big range of applications, such as price comparison portals, business intelligence tools, various mashups and etc. It encourages industry and academics to seek automatic solutions. To tackle the problem of automatic structured Web data extraction we present a new approach - structured data extraction based on clustering visually similar Web page elements. Our method called ClustVX combines visual and pure HTML features of Web page to cluster visually similar Web page elements and then extract structured Web data. ClustVX can extract structured data from Web pages where more than one data record is present. With extensive experimental evaluation on three benchmark datasets we demonstrate that ClustVX achieves better results than other state-of-the-art automatic structured Web data extraction methods.eng
dc.formatPDF
dc.format.extentp. 169-192
dc.format.mediumtekstas / txt
dc.language.isoeng
dc.relation.isreferencedbyScopus
dc.relation.isreferencedbyScience Citation Index Expanded (Web of Science)
dc.source.urihttp://www.jucs.org/jucs_20_2/unsupervised_structured_data_extraction
dc.subjectIK01 - Informacinės technologijos, ontologinės ir telematikos sistemos / Information technologies, ontological and telematic systems
dc.titleUnsupervised structured data extraction from template-generated web pages
dc.typeStraipsnis Web of Science DB / Article in Web of Science DB
dcterms.references46
dc.type.pubtypeS1 - Straipsnis Web of Science DB / Web of Science DB article
dc.contributor.institutionVilniaus Gedimino technikos universitetas
dc.contributor.facultyFundamentinių mokslų fakultetas / Faculty of Fundamental Sciences
dc.contributor.departmentInformacinių sistemų katedra / Department of Information Systems
dc.contributor.departmentTaikomosios informatikos institutas / Institute of Applied Computer Science
dc.subject.researchfieldT 007 - Informatikos inžinerija / Informatics engineering
dc.subject.ltspecializationsL106 - Transportas, logistika ir informacinės ir ryšių technologijos (IRT) / Transport, logistic and information and communication technologies
dc.subject.enDeep Web
dc.subject.enData extraction
dc.subject.enStructured web data
dc.subject.enWrapper induction
dcterms.sourcetitleJournal of Universal Computer Science (J.UCS)
dc.description.issueiss.2
dc.description.volumeVol. 20
dc.publisher.nameGraz University of Technology
dc.publisher.cityGraz
dc.identifier.doi10.2298/CSIS130416020G
dc.identifier.elaba4070163


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record