摘要
为了提高文本分类的准确性和效率,提出了一种基于潜在语义分析和超球支持向量机的文本分类模型。针对SVM对大规模文本分类时收敛速度较慢这一缺点,本文将超球支持向量机应用于文本分类,采用基于增量学习的超球支持向量机分类学习算法进行训练和分类。实验结果表明,超球支持向量机是一种解决SVM问题的有效方法,在文本分类应用中具有与SVM相当的精度,但是明显降低了模型复杂度和训练时间。
A text categorization model based on Latent Semantic Analysis and Hyper-sphere Support Vector Machine (HS-SVM) is proposed to improve the accuracy and efficiency of text categorization. As the convergence rate of using SVM to categorize the large-scale text is relatively slow,the Hyper-sphere Support Vector Machine is applied to text categorization and the Hyper-sphere Support Vector Machine Classification Learning Algorithm based on incremental learning is applied to training and categorization. Experiments show that the Hyper-sphere Support Vector Machine is an efficient solution to the SVM problem,and has the same accuracy as the SVM in the text categorization applications,but significantly reduces the complexity of the model and the training time.
出处
《情报理论与实践》
CSSCI
北大核心
2010年第7期104-107,共4页
Information Studies:Theory & Application
基金
教育部人文社会科学重点研究基地重大项目"基于智能信息处理的知识挖掘技术及应用研究"资助的成果之一
项目编号:08JJD870225
关键词
文本分类
潜在语义分析
支持向量机
text categorization
latent semantic analysis
support vector machine