Show simple item record

dc.contributor.authorGrigalis, Tomas
dc.date.accessioned2023-09-18T19:47:35Z
dc.date.available2023-09-18T19:47:35Z
dc.date.issued2012
dc.identifier.other(BIS)VGT02-000026697
dc.identifier.urihttps://etalpykla.vilniustech.lt/handle/123456789/143223
dc.description.abstractAutomatic extraction of structured data from web pages is one of the key challenges for theWeb search engines to advance into the more expressive semantic level. Here we propose a novel data extraction method, called ClustVX. It exploits visual as well as structural features of web page elements to group them into semantically similar clusters. Resulting clusters reflect the page structure and are used to derive data extraction rules. The preliminary evaluation results of ClustVX system on three public benchmark datasets demonstrate a high efficiency and indicate a need for a much bigger up-to-date benchmark data set that reflects contemporary WEB 2.0 web pages.eng
dc.format.extentp. 197-201
dc.format.mediumtekstas / txt
dc.language.isoeng
dc.source.urihttp://ceur-ws.org/Vol-924/paper18.pdf
dc.titleTowards automatic structured web data extraction system
dc.typeStraipsnis recenzuotame konferencijos darbų leidinyje / Paper published in peer-reviewed conference publication
dcterms.references12
dc.type.pubtypeP1d - Straipsnis recenzuotame konferencijos darbų leidinyje / Article published in peer-reviewed conference proceedings
dc.contributor.institutionVilniaus Gedimino technikos universitetas
dc.contributor.facultyFundamentinių mokslų fakultetas / Faculty of Fundamental Sciences
dc.contributor.departmentInformacinių sistemų katedra / Department of Information Systems
dc.subject.researchfieldT 007 - Informatikos inžinerija / Informatics engineering
dc.subject.enInformation extraction
dc.subject.enStructured web data
dc.subject.enDeep web
dcterms.sourcetitleDatabases and Information Systems (Baltic DB&IS 2012): tenth international Baltic conference on Databases and Information Systems : local proceedings, materials of doctoral consorcium, Vilnius, July 8-11, 2012
dc.publisher.nameŽara
dc.publisher.cityVilnius
dc.identifier.elaba4030603


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record