期刊文献+

iAES:面向网络安全博客的IOC自动抽取方法 被引量:9

An Indicator of Compromise Extraction Method Based on Deep Learning
下载PDF
导出
摘要 网络威胁指标(IOC)作为网络威胁的行为特征,可以按照标准组织起来并部署在安全系统中防御攻击.博客是重要的网络威胁情报来源,及时从中收集网络威胁指标能够快速应对新的安全威胁,但人工阅读并抽取IOC的方式耗时耗力,所以我们迫切需要一种从网络安全博客中自动抽取IOC的方法.为此,本文提出了一种面向网络安全博客的IOC自动抽取方法iAES(IOC Automatic Extraction System).该方法完成了博客的自动增量爬取、博客页面去噪预处理、结合文本特征和话题特征的博客分类、基于正则表达式匹配和深度学习模型的IOC语句识别、基于上下文语义相似性的IOC格式化.我们通过人工标记的方法建立了博客数据集、语句数据集和IOC数据集,分别对iAES与近期相关研究iACE进行测试,测试结果表明iAES在IOC博客分类、IOC语句分类和安全博客IOC抽取上的表现分别比iACE提升了9.46%、4.25%和7.11%.进而采用iAES对来自于29个安全博客网站的67682博文进行测试,并从自动获取的IOC语句中随机选取1000条进行人工验证,结果表明精确率达到94.3%. The behaviors of cyber threats can be characterized by Indicators of Compromise(IOC),a description standard for cyber threats,which can be deployed in various security systems to defend against those attacks.Since the IOC analysis results are usually first published on blogs,it is desirable to timely collect IOC from blogs so as to quickly respond to new cyber threats.However,because of the fast growing volume of the blogs,it is time consuming and labor intensive for security experts to extract IOC from the blogs manually.To address this problem,we propose an automatic IOC extraction scheme called iAES(i.e.,IOC Automatic Extraction System),which crawls the blogs automatically and incrementally,cleanses the blogs by filtering out the noises,classifies the blogs based on the text and theme features,adopts regular expressions and deep learning to identify IOC sentences,and finally extracts IOCs from those sentences through semantic similarity according to the associated context.To demonstrate the effectiveness of iAES,we construct three datasets including the blog dataset,the sentence dataset,and the IOC dataset by manual labeling.We implement both our iAES and iACE,a recently published IOC extraction scheme,and perform the comparison evaluation using the three datasets.The experiment results suggest that,compared with iACE,iAES enhances the F1 scores of the blog classification,the sentence classification and the IOC extraction by 9.46%,4.25%and 7.11%,respectively.In addition,we deploy our iAES in a real operating environment and randomly select 1000 IOC to manually verify its effectiveness.The result shows that the precision of iAES achieves 94.3%.
作者 王伟平 宁翔凯 宋虹 鲁鸣鸣 王建新 WANG Wei-Ping;NING Xiang-Kai;SONG Hong;LU Ming-Ming;WANG Jian-Xin(School of Computer Science and Engineering,Central South University,Changsha 410083)
出处 《计算机学报》 EI CAS CSCD 北大核心 2021年第5期882-896,共15页 Chinese Journal of Computers
基金 国家自然科学基金(61672543)资助
关键词 网络安全 网络威胁指标 深度学习 语句分类 IOC抽取 cyber security indicator of compromise deep learning sentence classification IOC extraction
  • 相关文献

同被引文献75

引证文献9

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部