摘要
设计中文网页聚类系统是为了便于从动态Web文本集中快速、有效地挖掘知识。该系统以经典聚类算法为基础,通过比较网页相似度,将相似度高的网页聚合,并提交用户界面显示。通过应用模糊数学中的不确定度,表达了样本类属不同的模糊性,从而更为真实地反映客观情况,改善了以往确定归属某一类的不完整信息收集,结果更具客观性。
Chinese web clustering system is raised for mining knowledge quickly and efficiently from flexible web documents. So the system is based on clustering algorithm. It can cluster similar webs automatically and submit the results to user interface finally by comparing their similaritis. Uncertainty degree belonging to each cluster by fuzzy clustering algorithm expresses uncertainty in kind belonging and can reflect real world more impersonally. Finally query results are improved.
出处
《江苏广播电视大学学报》
2007年第3期55-57,共3页
Journal of Jiangsu Radio & Television University
关键词
文本挖掘
聚类
信息检索
网页
web mining
clustering
information retrieval