摘要
针对维吾尔语Web文档的有效检索问题,提出一种基于相关反馈和文档相似度的检索词加权方法.首先,对维吾尔语文档进行预处理,获得相应的词干集.然后,当用户输入多个检索词时,执行初始检索,并基于局部相关反馈思想提取出排名靠前的N个文档.接着,利用TF-IDF算法计算检索词与反馈文档之间的词频相似度,通过余弦距离计算文档之间的相似度,并以此对检索词进行两次加权.最后,根据加权后的检索词进行文档检索.实验结果表明:该方法能够准确地检索出用户所需的文档,并将其靠前排序.
For the issue that the effective retrieval of Uyghur web documents, a Uyghur retrieval word weigh-ting scheme based on the relevance feedback and document similarity is proposed. First of all, the Uyghur doc-uments are pre-processed to obtain the corresponding stem set. Then, the initial search is executed when the user input a number of retrieval words, and it extracts the top N documents based on local relevance feedback. Follow, the TF-IDF algorithm is used to compute the frequency similarity between retrieval word and feedback documents. At the same time, the cosine distance is used to compute the similarity between documents, so as to make twice weighted for retrieval words. Finally? it performs document retrieval according to the weight of retrieval words. Experimental results show that the proposed method can accurately retrieve the documents re-quired by the user, and can sort them in the front.
作者
于丽
亚森.艾则孜
YU Li YASEN · AIZEZI(Department of Information Security Engineering, Xinjiang Police College, Urumqi 830011, China)
出处
《华侨大学学报(自然科学版)》
北大核心
2017年第3期408-413,共6页
Journal of Huaqiao University(Natural Science)
基金
新疆维吾尔自治区自然科学基金资助项目(2015211A016)
关键词
维吾尔语
文档检索
检索词加权
相关反馈
文档相似度
Uygur
document retrieval
weighted retrieval words
relevance feedback
document similarity