dc.contributor.author | Grigalis, Tomas | |
dc.date.accessioned | 2023-09-18T19:47:35Z | |
dc.date.available | 2023-09-18T19:47:35Z | |
dc.date.issued | 2012 | |
dc.identifier.other | (BIS)VGT02-000026697 | |
dc.identifier.uri | https://etalpykla.vilniustech.lt/handle/123456789/143223 | |
dc.description.abstract | Automatic extraction of structured data from web pages is one of the key challenges for theWeb search engines to advance into the more expressive semantic level. Here we propose a novel data extraction method, called ClustVX. It exploits visual as well as structural features of web page elements to group them into semantically similar clusters. Resulting clusters reflect the page structure and are used to derive data extraction rules. The preliminary evaluation results of ClustVX system on three public benchmark datasets demonstrate a high efficiency and indicate a need for a much bigger up-to-date benchmark data set that reflects contemporary WEB 2.0 web pages. | eng |
dc.format.extent | p. 197-201 | |
dc.format.medium | tekstas / txt | |
dc.language.iso | eng | |
dc.source.uri | http://ceur-ws.org/Vol-924/paper18.pdf | |
dc.title | Towards automatic structured web data extraction system | |
dc.type | Straipsnis recenzuotame konferencijos darbų leidinyje / Paper published in peer-reviewed conference publication | |
dcterms.references | 12 | |
dc.type.pubtype | P1d - Straipsnis recenzuotame konferencijos darbų leidinyje / Article published in peer-reviewed conference proceedings | |
dc.contributor.institution | Vilniaus Gedimino technikos universitetas | |
dc.contributor.faculty | Fundamentinių mokslų fakultetas / Faculty of Fundamental Sciences | |
dc.contributor.department | Informacinių sistemų katedra / Department of Information Systems | |
dc.subject.researchfield | T 007 - Informatikos inžinerija / Informatics engineering | |
dc.subject.en | Information extraction | |
dc.subject.en | Structured web data | |
dc.subject.en | Deep web | |
dcterms.sourcetitle | Databases and Information Systems (Baltic DB&IS 2012): tenth international Baltic conference on Databases and Information Systems : local proceedings, materials of doctoral consorcium, Vilnius, July 8-11, 2012 | |
dc.publisher.name | Žara | |
dc.publisher.city | Vilnius | |
dc.identifier.elaba | 4030603 | |