Show simple item record

dc.contributor.authorGriazev, Kiril
dc.contributor.authorRamanauskaitė, Simona
dc.date.accessioned2023-09-18T20:43:21Z
dc.date.available2023-09-18T20:43:21Z
dc.date.issued2021
dc.identifier.issn2076-3417
dc.identifier.urihttps://etalpykla.vilniustech.lt/handle/123456789/152033
dc.description.abstractThe need for automated data extraction is continuously growing due to the constant addition of information to the worldwide web. Researchers are developing new data extraction methods to achieve increased performance compared to existing methods. Comparing algorithms to evaluate their performance is vital when developing new solutions. Different algorithms require different datasets to test their performance due to the various data extraction approaches. Currently, most datasets tend to focus on a specific data extraction approach. Thus, they generally lack the data that may be useful for other extraction methods. That leads to difficulties when comparing the performance of algorithms that are vastly different in their approach. We propose a dataset of web page content blocks that includes various data points to counter this. We also validate its design and structure by performing block labeling experiments. Web developers of varying experience levels labeled multiple websites presented to them. Their labeling results were stored in the newly proposed dataset structure. The experiment proved the need for proposed data points and validated dataset structure suitability for multi-purpose dataset design.eng
dc.formatPDF
dc.format.extentp. 1-13
dc.format.mediumtekstas / txt
dc.language.isoeng
dc.relation.isreferencedbyScience Citation Index Expanded (Web of Science)
dc.relation.isreferencedbyDOAJ
dc.relation.isreferencedbyScopus
dc.rightsLaisvai prieinamas internete
dc.source.urihttps://doi.org/10.3390/app11083319
dc.source.urihttps://talpykla.elaba.lt/elaba-fedora/objects/elaba:90129513/datastreams/MAIN/content
dc.source.uri10.3390/app11083319
dc.titleMulti-purpose dataset of webpages and Its content blocks: Design and structure validation
dc.typeStraipsnis Web of Science DB / Article in Web of Science DB
dcterms.accessRightsThis article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).
dcterms.licenseCreative Commons – Attribution – 4.0 International
dcterms.references24
dc.type.pubtypeS1 - Straipsnis Web of Science DB / Web of Science DB article
dc.contributor.institutionVilniaus Gedimino technikos universitetas
dc.contributor.facultyFundamentinių mokslų fakultetas / Faculty of Fundamental Sciences
dc.subject.researchfieldT 007 - Informatikos inžinerija / Informatics engineering
dc.subject.researchfieldN 009 - Informatika / Computer science
dc.subject.vgtuprioritizedfieldsIK0303 - Dirbtinio intelekto ir sprendimų priėmimo sistemos / Artificial intelligence and decision support systems
dc.subject.ltspecializationsL106 - Transportas, logistika ir informacinės ir ryšių technologijos (IRT) / Transport, logistic and information and communication technologies
dc.subject.enweb
dc.subject.enblock
dc.subject.endataset
dc.subject.enlabeling
dcterms.sourcetitleApplied Sciences
dc.description.issueiss. 8
dc.description.volumevol. 11
dc.publisher.nameMDPI
dc.publisher.cityBasel
dc.identifier.doi000643951000001
dc.identifier.doi10.3390/app11083319
dc.identifier.elaba90129513


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record