期刊文献+

基于mRMR与因子分解机的分类模型研究 被引量:3

Classification model based on mRMR and factorization machines algorithm
下载PDF
导出
摘要 很多学者用“全球恐怖主义研究数据库”GTD数据集,采用博弈论、K近邻法和支持向量机等分析恐怖事件的聚集性,已经取得一些成果.但在前期研究中未有很好考虑数据的稀疏性以及高维度多冗余等会导致聚集分类准确率不高的问题.本文提出一种基于最小冗余最大相关与因子分解机结合的TFM分类模型,使用增量搜索方法寻找近似最优的特征解决高维度多冗余问题和FM方法解决数据稀疏问题,并对预处理后的恐怖袭击事件数据用TFM模型做量化分类.文中使用朴素贝叶斯NB、支持向量机SVM、逻辑回归LR与TFM等4个模型的“马修斯相关系数”MCC进行比较,结果显示TFM的MCC相对于其他三个模型NB、SVM、LR分别提高了49.9%,2.5%,2.3%,可见TFM模型有一定可行性. Many scholars have made some achievements in aggregation analysis of terrorist events by using the data set of "Global Terrorism Research Database"(GTD) with game theory, k-nearest-neighbor method and support vector machine. However, data sparsity and high-dimensional multi-redundancy are not well considered in the previous research, which may lead to low accuracy of clustering classification. This paper proposes a TFM classification model based on "Minimal-redundancy maximal-relevancy" (mRMR) combined with " Factorization Machines " (FM), in which the incremental search method is used to find approximately optimal features to address the high-dimensional multi-redundancy and the data sparsity is tackled with FM method. TFM model is then used to make quantitative classification on the pre-processed terrorist attack data. The experimental results show the proposed TFM model, in terms of Matthews correlation coefficient (MCC), is increased by 49.9%, 2.5% and 2.3% respectively compared with naive Bayes (NB), support vector machine (SVM) and logistic regression (LR). The comparative result demonstrates that TFM model is feasible to some extent.
作者 王美 龙华 邵玉斌 杜庆治 WANG Mei;LONG Hua;SHAO Yu-Bin;DU Qing-Zhi(Kunming University of Science and Technology,Faculty of Information Engineering and Automation,Kunming 650000,China)
出处 《四川大学学报(自然科学版)》 CAS CSCD 北大核心 2020年第1期96-102,共7页 Journal of Sichuan University(Natural Science Edition)
基金 国家自然科学基金(61761025)
关键词 最小冗余最大相关 GTD 因子分解机 马修斯相关系数 TFM分类模型 mRMR GTD Factorization machines MCC TFM classification model
  • 相关文献

参考文献6

二级参考文献33

  • 1宋枫溪,高林.文本分类器性能评估指标[J].计算机工程,2004,30(13):107-109. 被引量:33
  • 2凌晓峰,SHENG Victor S..代价敏感分类器的比较研究(英文)[J].计算机学报,2007,30(8):1203-1212. 被引量:35
  • 3Tan PangNing,Steinbach M,Kumar V.Introduction to Data Mining[M].[s.l.]:Addison Wesley,2005.
  • 4Bradley A P.The use of the area under the ROC curve in the evaluation of machine learning algorithms[J].Pattern Recognition,1997,30:1145-1159.
  • 5Wu Shaomin,Flach P.Scored and Weighted AUC Metrics for Classifier Evaluation and Selection[A].in Proc 2nd Workshop on ROC Analysis in Machine Learning(ROCML-05)[C].Bonn,Germany:[s.n.],2005.
  • 6Fawcett T.ROC Graphs:Notes and Practical Considerations for Data Mining Researchers[R].HPL-2003-4.[s.l.]:HPLabs,2003.
  • 7Huang Jin,Ling C X.Using AUC and Accuracy in Evaluating Learning Algorithms[J].IEEE Transactions on Knowledge and Data Engineering (TKDE),2005,17(3):299-310.
  • 8Hanley J A,McNeil B J.The Meaning and Use of the Area Under a Receiver Operating Characteristic (ROC) Curve[J].Radiology,1982,143:29-36.
  • 9Adams N M,Hand D J.Comparing classifiers when the misallocation costs are uncertain[J].Pattern Recognition,1999,32(7):1139-1147.
  • 10Hand D J,Till R J.A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems[J].Machine Learning,2001,45:171-186.

共引文献74

同被引文献37

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部