Clustering visually similar web page elements for structured web data extraction
Data
2012Autorius
Grigalis, Tomas
Radvilavičius, Lukas
Čenys, Antanas
Gordevičius, Juozas
Metaduomenys
Rodyti detalų aprašąSantrauka
We propose a novel approach for extraction of structured web data called ClustVX. It clusters visually similar web page elements by exploiting their visual formatting and structural features. Clusters are then used to derive extraction rules. The experimental evaluation results of ClustVX system on three publicly available benchmark data sets outperform state-of-the-art structured data extraction systems.