摘要
基于概念的文本分类方法是近年来提出的一种新的文本分类方法,弥补了以前基于关键词的文本分类方法的一些不足,对同义词、多义词能进行比较好的处理。但是基于概念的文本分类方法往往对人名、机构名等具有分类特征的词不能很好处理。文中提出了一种将语义词典与一部人名、机构名构成的专有名词词典相结合的新的概念分类方法。并经过实验验证了其有效性。
Text categorization based on concept is a new method that was introduced in recent years. It offsets some shortcomings of the traditional method, such as the phenomenon of synonymy. But this new method can't dispose the name of people and the name of institution. In this paper a new method for text categorization based on concept was introduced. In experience we formed a new dictionary that included a lot of name of people that often appeared in text. At last checked the methed's efficiency by experience.
出处
《微机发展》
2005年第3期11-13,56,共4页
Microcomputer Development
基金
河北省自然科学基金资助项目(F2004000132)
关键词
文本分类
概念分类
K近邻法
text categorization
concept categorization
KNN