Clustering visually similar web page elements for structured web data extraction
Date
2012Author
Grigalis, Tomas
Radvilavičius, Lukas
Čenys, Antanas
Gordevičius, Juozas
Metadata
Show full item recordAbstract
We propose a novel approach for extraction of structured web data called ClustVX. It clusters visually similar web page elements by exploiting their visual formatting and structural features. Clusters are then used to derive extraction rules. The experimental evaluation results of ClustVX system on three publicly available benchmark data sets outperform state-of-the-art structured data extraction systems.