摘要
很多学者用“全球恐怖主义研究数据库”GTD数据集,采用博弈论、K近邻法和支持向量机等分析恐怖事件的聚集性,已经取得一些成果.但在前期研究中未有很好考虑数据的稀疏性以及高维度多冗余等会导致聚集分类准确率不高的问题.本文提出一种基于最小冗余最大相关与因子分解机结合的TFM分类模型,使用增量搜索方法寻找近似最优的特征解决高维度多冗余问题和FM方法解决数据稀疏问题,并对预处理后的恐怖袭击事件数据用TFM模型做量化分类.文中使用朴素贝叶斯NB、支持向量机SVM、逻辑回归LR与TFM等4个模型的“马修斯相关系数”MCC进行比较,结果显示TFM的MCC相对于其他三个模型NB、SVM、LR分别提高了49.9%,2.5%,2.3%,可见TFM模型有一定可行性.
Many scholars have made some achievements in aggregation analysis of terrorist events by using the data set of "Global Terrorism Research Database"(GTD) with game theory, k-nearest-neighbor method and support vector machine. However, data sparsity and high-dimensional multi-redundancy are not well considered in the previous research, which may lead to low accuracy of clustering classification. This paper proposes a TFM classification model based on "Minimal-redundancy maximal-relevancy" (mRMR) combined with " Factorization Machines " (FM), in which the incremental search method is used to find approximately optimal features to address the high-dimensional multi-redundancy and the data sparsity is tackled with FM method. TFM model is then used to make quantitative classification on the pre-processed terrorist attack data. The experimental results show the proposed TFM model, in terms of Matthews correlation coefficient (MCC), is increased by 49.9%, 2.5% and 2.3% respectively compared with naive Bayes (NB), support vector machine (SVM) and logistic regression (LR). The comparative result demonstrates that TFM model is feasible to some extent.
作者
王美
龙华
邵玉斌
杜庆治
WANG Mei;LONG Hua;SHAO Yu-Bin;DU Qing-Zhi(Kunming University of Science and Technology,Faculty of Information Engineering and Automation,Kunming 650000,China)
出处
《四川大学学报(自然科学版)》
CAS
CSCD
北大核心
2020年第1期96-102,共7页
Journal of Sichuan University(Natural Science Edition)
基金
国家自然科学基金(61761025)