摘要
针对如何从开源网络安全报告中高效挖掘威胁情报的问题,提出了一种基于威胁情报命名实体识别(Threat Intelligence Named Entity Recognition, TI-NER)算法的威胁情报挖掘(TI-NER-based Intelligence Mining, TI-NER-IM)方法。首先,收集了近10年的物联网安全报告并进行标注,构建威胁情报实体识别数据集;其次,针对传统实体识别模型在威胁情报IoC攻击指示器挖掘领域的不足,提出了基于自注意力机制和字符嵌入的威胁情报实体识别(Threat Intelligence Entity Identification based on Self-attention Mechanism and Character Embedding, TIEI-SMCE)模型,该模型融合字符嵌入信息,再通过自注意力机制捕获单词间潜在的依赖权重、语境等特征,从而准确地识别威胁情报IoC实体;然后,基于TIEI-SMCE模型,提出了一种威胁情报命名实体识别算法;最后,集成上述模型和算法,进一步提出了一种新的威胁情报挖掘方法。TI-NER-IM方法能实现从非结构化、半结构化网络安全报告中自动挖掘威胁情报IoC实体。实验结果表明,与BERT-BiLSTM-CRF模型相比,TI-NER-IM方法的F1值提升了1.43%。
To address the problem of how to efficiently mine threat intelligence from open source network security reports,a TI-NER-based intelligence mining(TI-NER-IM)method is proposed.Firstly,the IoT cybersecurity reports of nearly 10 years are collected and annotated to construct a threat intelligence entity identification dataset.Secondly,in view of the lack of performance of traditional entity recognition models in the field of threat intelligence mining,a threat intelligence entity identification based on self-attention mechanism and character embedding(TIEI-SMCE)model is proposed,which fuses character embedding information.The potential dependency weights between words,contexts and other characteristics are then captured through self-attention mechanism to accurately identify threat intelligence entities.Thirdly,a threat intelligence named entity recognition(TI-NER)algorithm based on TIEI-SMCE model is proposed.Finally,a TI-NER-based intelligence mining(TI-NER-IM)method is designed and proposed.TI-NER-IM method enables automated mining of threat intelligence from unstructured and semi-structured security reports.Eexperimental results show that compared with the BERT-BiLSTM-CRF model,TI-NER-IM's F1 value increases by 1.43%.
作者
魏涛
李志华
王长杰
程顺航
WEI Tao;LI Zhihua;WANG Changjie;CHENG Shunhang(School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi,Jiangsu 214122,China)
出处
《计算机科学》
CSCD
北大核心
2023年第6期330-337,共8页
Computer Science
基金
工业和信息化部智能制造项目(ZH-XZ-180004)
中央高校基本科研业务费专项资金(JUSRP211A41,JUSRP42003)。
关键词
威胁情报挖掘
自然语言处理
实体抽取
攻击指示器(IoC)
Threat intelligence mining
Natural language processing
Entity extraction
Indicators of compromise