期刊文献+

基于URL文本特征及链接关系的钓鱼网站识别算法 被引量:3

A fishing website identification algorithm based on URL text feature and link relation
下载PDF
导出
摘要 为了提高对钓鱼网站的识别准确率,通过对钓鱼网站统一资源定位符(URL)文本数据的分析,结合钓鱼网站内部链接关系组成的网络拓扑结构特征,提出了基于URL文本特征及链接关系的钓鱼网站识别算法FAUFL。该算法的原理是:以URL文本特征作为输入,采用随机森林算法生成基于URL文本特征的钓鱼网站判别算法;以链接关系作为输入构建相关网页群,采用基于最大流切割的相关网页群算法生成基于链接关系的钓鱼网站判别算法;将上述两种判别算法结果作为输入,采用Bagging算法进行进一步评估。测试结果表明钓鱼网站识别算法FAUFL算法的识别准确率为99.2%,比基于URL文本特征的算法的准确率提高3.9%,比基于链接关系的算法提高5.0%。 Based on the analysis of the uniform resource location (URL) text data of fishing sites and the characteristics of the network topology composed of fishing websites, a fishing site recognition algorithm based on URL text features and link relation (FAUFL) is proposed to improve the accuracy rate of fishing site recognition. The principle of the algorithm is as below: By using URL text features as input, the random forest algorithm is used to generate the fishing site discrimination algorithm based on URL text features. The related web page group is constructed by using the link relation as input, and the related web page algorithm based on the maximum flow cutting is used to generate the fishing website based on the link discriminant algorithm. By taking the above two kinds of discriminant algorithms' results as input, the further evaluation is conducted by using the Bagging algorithm. The test results show that the accuracy rate of the FAUFL is 99.2%, which is 3.9% higher than that of the URL text feature-based algorithm, and 5.0% higher than that of the link-based algorithm
作者 赵蹲宇 张兆心 Zhao Dunyu;Zhang Zhaoxin(School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001)
出处 《高技术通讯》 北大核心 2017年第8期708-717,共10页 Chinese High Technology Letters
基金 国家重点研发计划(SQ2017YFGX110125-01) 国家自然科学基金(61370215 61370211 61402137) 国家科技支撑计划(2012BAH45B01) 国家信息安全计划(2017A065 2017A111)资助项目
关键词 钓鱼网站 融合算法 统一资源定位符(URL) 文本特征 链接关系 fishing website, fusion algorithm, uniform resource location (URL), text feature, link relation
  • 相关文献

参考文献5

二级参考文献44

共引文献149

同被引文献25

引证文献3

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部