期刊文献+

一种面向安全领域的身份识别与关联方法 被引量:3

An Entity Resolution and Correlation Measure Methods for Security Fields
下载PDF
导出
摘要 大数据蕴藏的巨大商机引发了大数据产业化浪潮,互联网数据以其庞大的数据和简单的获得方式成为了首要分析目标。得益于互联网大数据的发展,安全领域的侦查手段从传统的事后侦查和重点监控,发展到可以进行预防性分析,在某种程度上可避免危害发生。使用互联网数据进行产业化挖掘面临着两个基本问题:多源数据的解析、清洗与整合;互联网身份的实体识别。结合具体安全服务,给出了一种普适的基于Map Reduce的互联网大数据去冗降噪的统计方法,可大幅降低数据存储空间,并在此基础上流程化地完成互联网虚拟身份识别模型。它能够量化互联网用户身份关系的可靠性和关联稳定性,并结合R语言给出了可视化展示。 Tremendous opportunities reserved in big data results in tidal waves of big data industrialization. Internet big data turns out to be the primary analytical object due to its great capacity and its ease of acquisition. Benefit from it,when do sleuthing in security fields, now we can analysis in advance before crime is made, prevent crime happens to some degree. Industrialization data mining from Internet big data has two basic difficulties. The pressure on storage devices and the low density of value attributed to its non-structural and redundancy characteristics. The difficulty of data mining results from the concern about information security. This paper puts forward a service-oriented statistical method based on Map Reduce, which could significantly reduce the amount of data volume. What's more, this paper elaborates a streamline Internet entity resolution model which quantifies the correlation between entity-attributes and its stability. A visual R graph presents as supplementarv.
出处 《软件导刊》 2016年第2期170-174,共5页 Software Guide
关键词 互联网大数据 身份识别 身份关联 HADOOP 安全领域 产业化 Internet Big data Entity Resolution Correlation Measure HADOOP Security Fields industrialization
  • 相关文献

参考文献6

  • 1ZAHARIA M,CHOWDHURY M, DAS T, et al. Resilient distribu- ted datasets: a fault-tolerant abstraction for in-memory cluster eomputing[C]. Proceedings of the 9th USENIX conference on Networked Systems Design and ImplementationUSENIX Association, 2012.
  • 2LOHR S. The age of big data[EB/OL], http://www, nytimes. com/2012/02/12/sunday-review/big-datas-impact-in-the-world. html.
  • 3李国杰,程学旗.大数据研究:未来科技及经济社会发展的重大战略领域——大数据的研究现状与科学思考[J].中国科学院院刊,2012,27(6):647-657. 被引量:1606
  • 4KHURANA H, BASNEY J, BAKHT M, et al. Palantir: a framework for collaborative incident response and investigation[C]. Proceedings of the 8th Symposium on Identity and Trust on the InternetACM,2009:38-51.
  • 5DEAN J, GHEMAWAT S. Map reduce: simplified data processing on large clusters[J ]. Communications of the ACM, 2008,51 ( 1 ):107-113.
  • 6GHEMAWAT S,GOBIOFF H,LEUNG S T. The google file system[J]. Acm Sigops Operating Systems Review,2003(37):29-43.

二级参考文献18

  • 1Chris Anderson. The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. Wired, 2008, 16 (7).
  • 2Albert-L~iszl6 Barab~isi. The network takeover. Nature Physics, 2012,8(1): 14-16.
  • 3Reuven Cohen, Shlomo Havlin. Scale-Free Networks Are U1- trasmall. Physical Review Letters, 2003, 90,(5 ).
  • 4Tony Hey, Stewart Tansley, Kristin Tolle (Editors). The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft, 2009 October 16.
  • 5Big Data. Nature, 2008, 455(7 209): 1-136.
  • 6Dealing with data. Science, 2011,331 ( 6 018 ): 639-806.
  • 7Complexity. Nature Physics, 2012, 8( 1 ).
  • 8Big Data. ERCIM News, 2012, (89).
  • 9David Lazer, Alex Pentland, Lada Adamic et al. Computational Social Science. Science, 2009, 323 ( 5 915 ): 721-723.
  • 10The 2011 Digital Universe Study: Extracting Value from Chaos. International Data Corporation and EMC, June 2011.

共引文献1605

同被引文献23

引证文献3

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部