Show simple item record

dc.contributor.authorNekrašaitė-Liegė, Vilma
dc.contributor.authorČiginas, Andrius
dc.contributor.authorKrapavickaitė, Danutė
dc.date.accessioned2023-09-18T16:21:59Z
dc.date.available2023-09-18T16:21:59Z
dc.date.issued2022
dc.identifier.urihttps://etalpykla.vilniustech.lt/handle/123456789/113485
dc.description.abstractAn increasing amount of data sources suggests a task to integrate them with the ordinary data sources used in official statistics. One of the problems under the study at Statistics Lithuania is to revise some indicators and to find out if there is room for their accuracy improvement using data from additional sources. The proportion of companies possessing the websites is one such indicator. Traditionally it is estimated using the data of the Information and Communication Technology sample survey. Information about enterprise website possession is provided also by a private company. However, this data source is updated on a voluntary basis and has some drawbacks: it does not cover all the population, thus the estimator based on this data source should be biased (Tam and Kim, 2018). Another way to create a list of enterprises owing the websites is to do it by web scrapping (ESSnet Big Data I, ESSnet Big Data II). Following a common methodology, ten potential URLs are found for each enterprise applying a search engine to the population. A logistic regression model is used to estimate the probability, that the selected URL is a website of the particular enterprise. If this probability reaches the fixed threshold, then a conclusion, that the enterprise owns the website, is made. Otherwise, the conclusion is opposite. However, it is known from other research sources, that the accuracy of such an enterprise classification is around 59-89 percent truthful and depends on a search engine, training sample, etc. Therefore, it may seem that there is no possibility of renouncing the collection of the data on websites through the ICT survey, however, the combination of different sources may lead to more efficient estimators. See Beaumont (2020), Kim and Tam (2021) and Rao (2021) among others. In this research, the number of methods to integrate auxiliary data obtained from alternative sources with the survey data for bias adjustment is examined. The integration leads to more efficient estimators in comparison with the estimators based only on the survey data. The accuracy measures of the estimators considered are evaluated.eng
dc.format.extentp. 51-52
dc.format.mediumtekstas / txt
dc.language.isoeng
dc.rightsLaisvai prieinamas internete
dc.source.urihttps://wiki.helsinki.fi/display/BNU/Workshop+on+Survey+Statistics+2022+Scientific+Programme?preview=/406850775/438215614/BNU%202022%20Proceedings.pdf
dc.source.urihttps://talpykla.elaba.lt/elaba-fedora/objects/elaba:138918140/datastreams/MAIN/content
dc.source.urihttps://talpykla.elaba.lt/elaba-fedora/objects/elaba:138918140/datastreams/COVER/content
dc.titleUsage of non-probability sample and scraped data to estimate proportions
dc.typeKonferencijos pranešimo santrauka / Conference presentation abstract
dcterms.references6
dc.type.pubtypeT2 - Konferencijos pranešimo tezės / Conference presentation abstract
dc.contributor.institutionLietuvos statistikos departamentas Vilniaus Gedimino technikos universitetas
dc.contributor.institutionLietuvos statistikos departamentas Vilniaus universitetas
dc.contributor.institutionVilniaus Gedimino technikos universitetas
dc.contributor.facultyFundamentinių mokslų fakultetas / Faculty of Fundamental Sciences
dc.subject.researchfieldN 001 - Matematika / Mathematics
dc.subject.studydirectionA03 - Statistika / Statistics
dc.subject.vgtuprioritizedfieldsFM0101 - Fizinių, technologinių ir ekonominių procesų matematiniai modeliai / Mathematical models of physical, technological and economic processes
dc.subject.ltspecializationsL103 - Įtrauki ir kūrybinga visuomenė / Inclusive and creative society
dc.subject.enbig data
dc.subject.encoverage bias
dc.subject.enpost-stratification
dc.subject.encalibration weighting
dc.subject.enaccuracy estimation
dcterms.sourcetitleBaltic-Nordic-Ukrainian workshop on survey statistics 2022, August 23-26, 2022, Tartu, Estonia
dc.publisher.nameStatistics Estonia
dc.publisher.cityTartu
dc.identifier.elaba138918140


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record