摘要
耕地资源数据库包含大量的数据,不同地区、不同类型的数据维度也会有所不同,使得文本分类需要处理的特征数量很大,增加了分类的难度。为了有效改善耕地资源数据库文本分类结果的准确性,提出一种耕地资源数据库文本VSM模糊分类方法。通过CHI筛选出耕地资源数据库文本的类别特征词,采用PCA方法展开二次降维处理。对降维处理后的文本特征展开分析,采用合适的规则抽取代表主题的特征义原,扩展全部特征义原,同时选择合适的分类器建立模糊VSM模型,通过该模型实现耕地资源数据库文本分类。实验结果表明,采用所提方法可以获取高精度和高效率的文本分类结果,有效提升了文本分类效果。
Typically,the database of cultivated land resources contains a large amount of data,and the different data dimensions in different regions may require us to process a large number of features during the text classification and increase the difficulty of classification.In order to effectively improve the accuracy of text classification for cultivated land resource databases,this paper presented a VSM fuzzy classification method for the text of cultivated land resource databases.Firstly,the category feature words in the text of the cultivated land resource database were filtered out by CHI,and then the PCA method was adopted to perform the secondary dimension reduction.After that,text features after dimensionality reduction were analyzed.Moreover,the feature sememes representing the theme were extracted by appropriate rules,and all feature sememes were extended.Meanwhile,the appropriate classifier was selected to build a fuzzy VSM model.Based on the model,the text classification of the cultivated land resource database was completed.Experimental results show that the proposed method can obtain high-precision and high-efficiency results of text classification,and effectively improves the classification effect.
作者
李杨
尹飞
惠向晖
LI Yang;YIN Fei;HUI Xiang-hui(College of Information and Management Science,Henan Agricultural University,Zhengzhou Henan 450046,China)
出处
《计算机仿真》
2024年第10期478-481,492,共5页
Computer Simulation
基金
国家重点研发计划课题(2017YFD0301105)。
关键词
模糊分类耕地资源数据库
文本分类
数据降维
Fuzzy classification
Cultivated land resources database
Text classification
Data dimension reduction