Rodyti trumpą aprašą

dc.contributor.authorGrigalis, Tomas
dc.date.accessioned2023-09-18T19:47:35Z
dc.date.available2023-09-18T19:47:35Z
dc.date.issued2013
dc.identifier.other(BIS)VGT02-000026698
dc.identifier.urihttps://etalpykla.vilniustech.lt/handle/123456789/143224
dc.description.abstractIn this paper we present an ongoing PhD research on unsupervised and domain-independent structured data extraction from the Web. We propose a novel method to extract structured data records from template-generated Web pages. The method is based on clustering visually similar Web page elements by exploiting their visual formatting and HTML structural features. Tag paths of clustered Web page elements are then employed to derive extraction rules. These rules, called wrappers, can be later reused on thousands of same template-generated Web pages. This opens the possibility for the proposed method to be deployed in Web-Scale structured data extraction systems.eng
dc.formatPDF
dc.format.extentp. 753-758
dc.format.mediumtekstas / txt
dc.language.isoeng
dc.titleTowards web-scale structured web data extraction
dc.typeStraipsnis recenzuotame konferencijos darbų leidinyje / Paper published in peer-reviewed conference publication
dcterms.references34
dc.type.pubtypeP1d - Straipsnis recenzuotame konferencijos darbų leidinyje / Article published in peer-reviewed conference proceedings
dc.contributor.institutionVilniaus Gedimino technikos universitetas
dc.contributor.facultyFundamentinių mokslų fakultetas / Faculty of Fundamental Sciences
dc.contributor.departmentInformacinių sistemų katedra / Department of Information Systems
dc.subject.researchfieldT 007 - Informatikos inžinerija / Informatics engineering
dcterms.sourcetitleWeb Search and Data Mining (WSDM'13) : proceedings of the sixth ACM international conference
dc.publisher.nameACM
dc.publisher.cityNew York
dc.identifier.doi10.1145/2433396.2433491
dc.identifier.elaba4030629


Šio įrašo failai

FailaiDydisFormatasPeržiūra

Su šiuo įrašu susijusių failų nėra.

Šis įrašas yra šioje (-se) kolekcijoje (-ose)

Rodyti trumpą aprašą