摘要
应用有指导的机器学习方法实现了一个文本分类器。运用改进型的CHI统计量方法对分词结果进行特征提取,对传统的TF-IDF加权公式进行了一些改进(称之为:ETF-IDF),运用资源优化神经网络RON(Resource-optimizing Networks)构建分类器。在复旦大学提供的中文文本分类语料库上进行分类实验,实验结果表明该分类器较之BP算法有较高的分类质量,且ETF-IDF加权公式较之传统的TF-IDF加权公式有其优越性,提高了分类的精度和性能,满足了中文文本自动分类的要求。
In this paper the supervised machine learning theory is made use of to implement a text classifier.The method can be conducted as follows,the improved chi statistic method is used to extract the feature of text segmentation results,some improvements are made on traditional TF-IDF Weight Formula(named ETF-IDF),and the classifier is constructed using Resource-optimizing neural networks(RONN).Classification experiments are carried out on Chinese text classified corpus of Fudan University,and the results show that the classifier we constructed performs better in classification quality than BP network,and the ETF-IDF Weight Formula prevails against traditional TF-IDF Weight Formula in text classification,it improves the precision and performance of the classifier,and the requirement of automatic classification of the Chinese text is then met.
出处
《计算机应用与软件》
CSCD
2010年第7期33-36,共4页
Computer Applications and Software
基金
国家重点基础研究973计划项目(2004CB318108
2007CB311003)
国家自然科学基金项目(60675031)
关键词
文本分类
CHI统计量
RON
资源优化神经网络
Text classification CHI statistic Resource-optimizing network(RON) Resource-optimizing neural networks