Iš šablonų sugeneruotų hipertekstinių dokumentų klasterizavimas pagal struktūrinį panašumą

Grenda, Mindaugas

dc.contributor.author	Grenda, Mindaugas
dc.date.accessioned	2023-09-18T08:54:05Z
dc.date.available	2023-09-18T08:54:05Z
dc.date.issued	2013
dc.identifier.uri	https://etalpykla.vilniustech.lt/handle/123456789/108433
dc.description.abstract	Darbe pasiūlyti keturi nauji bendrų kelių metodo pagerinimai, skirti hipertekstinių dokumentų, sugeneruotų iš šablonų, klasterizavimui: atstumų skaičiavimas, remiantis XPATH, prieš tai pašalinus gretimų DOM medžio elementų indeksus iš XPATH; XPATH rinkinio dalies pašalinimas prieš skaičiuojant atstumus; XPATH kelių pradžios nukirtimas iki tam tikro gylio prieš skaičiuojant atstumus; XPATH, kurie yra bendri visiems klasterizuojamiems dokumentams, pašalinimas. Eksperimentų metu patikrinta, kad visi pasiūlyti metodai pranoksta bazinį metodą, kuriame atstumai skaičiuojami naudojant elementų XPATH. Labiausiai klasterizavimo kokybę pagerina elementų indeksų pašalinimas iš XPATH. Taip pat pasiūlytas metodas, kaip identifikuoti reikiamą kiekį klasterių, iš klasterizavimo metu gautos hierarchinės klasterių struktūros. Šis metodas klasterius identifikuoja geriau nei plačiai žinomi metodai: klasterių parinkimo pagal ribinę reikšmę ir klasterių parinkimo pagal medžio viršūnės gylį.	lit
dc.description.abstract	There are four new improvements of common paths method proposed for clustering of templates based hypertext documents by structural similarity in the thesis. The methods include: measuring the distance using XPATH with removed sibling indexes; removing the part of the set of XPATH before measuring the distances; cutting of the beginnings of the paths to a certain depth before measuring the distances; and removing XPATHs that are common to all clustered documents. During the experiments there has been found that all proposed methods surpass the effectiveness of basic method that measures the distances by using simple XPATH. The method that increases the quality of clustering of documents the most is the removal of sibling indexes XPATH. Moreover, there is a method proposed for the identification of the required amount of clusters from the hierarchical structure obtained from clustering process. Proposed method identifies clusters more accurate than well-known methods like selection of clusters by the threshold value or selecting the clusters by the depth of the clusters in the tree.	eng
dc.format	PDF
dc.format.extent	59 p.
dc.format.medium	tekstas / txt
dc.language.iso	lit
dc.rights	Prieinamas tik institucijos intranete
dc.source.uri	https://talpykla.elaba.lt/elaba-fedora/objects/elaba:1772404/datastreams/MAIN/content
dc.title	Iš šablonų sugeneruotų hipertekstinių dokumentų klasterizavimas pagal struktūrinį panašumą
dc.title.alternative	Clustering of templates based hypertext documents by structural similarity
dc.type	Magistro darbas / Master thesis
dc.type.pubtype	ETD_MGR - Magistro darbas / Master thesis
dc.contributor.institution	Vilniaus Gedimino technikos universitetas
dc.subject.researchfield	T 007 - Informatikos inžinerija / Informatics engineering
dc.subject.lt	klasterizavimas
dc.subject.lt	html
dc.subject.lt	xpath
dc.subject.lt	struktūrinis panašumas
dc.subject.en	clustering
dc.subject.en	html
dc.subject.en	xpath
dc.subject.en	structural similarity
dc.publisher.name	Lithuanian Academic Libraries Network (LABT)
dc.publisher.city	Kaunas
dc.identifier.elaba	1772404

Šio įrašo failai

Pavadinimas:: darbas.pdf
Dydis:: 1.127Mb
Formatas:: PDF

Peržiūrėti/Atidaryti

Šis įrašas yra šioje (-se) kolekcijoje (-ose)

Magistrų darbai / Master theses [2734]

Rodyti trumpą aprašą