摘要
在文本过滤中信息分流是提高过滤效率的强有力的手段 ,为此 ,提出了一种新的中文文本过滤的信息分流机制 .其基本思想是在概念扩充基础上 ,将不同用户的信息需求组织为树状结构 ,使其共同的部分成为共享分支 ,依据提出的侧面相似度和侧面匹配率来实现文本与模板的定量匹配 ,减弱传统的布尔模型对文本与模板匹配的严格限制 ,也弥补向量空间模型单纯数量化的不足 ,更加全面地反映用户的信息需求 .试验表明该机制能够明显地提高过滤效率 .
The information diffluence has an important role in improving the efficiency of text filtering, so a new mechanism for information diffluence is presented in this paper. The main idea of the mechanism is shown as follows: Based on the concept expansion for the keyword given by users, user profiles are automatically constructed into the structure of CDT (concept based decision tree), and the mechanism for information diffluence is based on the CDT. It has the common segments shared by users, and it implements the quantitative matching between texts and user profiles based on the side similarity and the side matching ratio. Consequently, it weakens the strict Boolean constraint and overcomes the shortcoming of the vector space model which only focuses on the quantitative factors. As a result, the mechanism can express the information requirements for diverse users across the board and remarkably improve the efficiency of text filtering.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2000年第4期470-476,共7页
Journal of Computer Research and Development
基金
国家自然科学基金项目!(项目编号 69675 0 19)
国家教委博士点基金
关键词
文本过滤
概念扩充
信息分流
判定树
信息处理
text filtering, vector space model, concept expansion, user profiles, information diffluence, decision tree