摘要
分类是文本挖掘的基础和核心,是近年来数据挖掘和网络挖掘的一个研究热点。本文从定性和定量两个方面,介绍国内外文本分类研究现状,分析影响文本分类的重要因素,希望通过对文本分类系统和算法的评测总结发现研究中存在的共同问题,为文本自动分类的优化、改进提供理论和事实依据。
Text categorization is the foundation and core of text - mining, which has been a research focus of data - mining and Internet - mining in recent years. This article introduces domestic and foreign research situation on text categorization from the view of the nature and quantity. It analyzes the important factors affecting text categorization, and hope to find the common problem by evaluating summary of text categorization system and arithmetic. The goal of the article is to provide theory and fact for the optimization and improvement of text automatic categorization.
出处
《现代图书情报技术》
CSSCI
北大核心
2005年第5期46-49,14,共5页
New Technology of Library and Information Service
关键词
自动分类
测评
特征选择
Automatic categorization Evaluate Feature selection