期刊文献+

基于Hadoop的微阵列数据两阶段并行K近邻基因提取 被引量:1

Micro-array Data Two-stage Parallel K Nearest Neighbor Gene Extraction Based on Hadoop
下载PDF
导出
摘要 基因信息选取工作中由于数据量庞大,传统单线程运行的分类查询方法无法满足实时性与提取精度要求。为此,利用Hadoop框架设计两阶段并行计算模型。其中第1阶段用于候选基因子集并行选取,第2阶段用于并行K近邻基因信息选取,从而实现并行计算的全过程覆盖。为降低算法的计算复杂度,针对基因信息微阵列数据,定义数据筛选指标对其进行采样,在降低数据处理量的同时消除数据冗余。实验结果表明,该算法具有较高的运行效率,并且继承了Hadoop编程模型的可扩展特性,可移植性较强。 Because of huge amount of data in gene information extraction, whose real-time requirements can not be met by traditional methods with single threaded operation, the Hadoop framework is used to design the two-stage parallel computing model. The first stage is used to extract candidate gene subset, and the second stage is used to extract parallel K nearest neighbor genetic information, and it implements whole process cover of parallel computing. At the same time,in order to further reduce the computational complexity of the algorithm, the microarray data sampling method is used to reduce the amount of data processing and eliminate data redundancy. Experimental results show that the proposed algorithm has better running efficiency, inherits the extensible features of Hadoop programming model and has strong portability.
出处 《计算机工程》 CAS CSCD 北大核心 2016年第5期54-59,共6页 Computer Engineering
基金 辽宁省教育厅基金资助项目(L2012113)
关键词 Hadoop框架 并行计算 微阵列采样 大数据 K近邻 基因信息 Hadoop framework parallel computing micro-array sampling big data K nearest neighbor gene information
  • 相关文献

参考文献15

  • 1Katsigiannis K,Zacharia E,Maroulis D.Grow-cut Based Automatic c DNA Microarray Image Segmentation[J].IEEE Transactions on Nano Bioscience,2015,14(1):138-144.
  • 2Sakashita H,Akamine S,Ishida T.Erratum to:Identification of the NEDD4L Gene as a Prognostic Marker by Integrated Microarray Analysis of Copy Number and Gene Expression Profiling in Non-small Cell Lung Cancer[J].Annals of Surgical Oncology,2014,21(4):783-792.
  • 3于化龙,顾国昌,赵靖,刘海波,沈晶.基于DNA微阵列数据的癌症分类问题研究进展[J].计算机科学,2010,37(10):16-22. 被引量:20
  • 4印莹,赵宇海,张斌,王国仁.时序微阵列数据中的同步和异步共调控基因聚类[J].计算机学报,2007,30(8):1302-1314. 被引量:5
  • 5Patrick C H,Keith C C,Yao Xin.An Evolutionary Clustering Algorithm for Gene Expression Microarray Data Analysis[J].IEEE Transactions on Evolutionary Computation,2006,10(3):296-314.
  • 6Chan S C,Wu Haichang,Tsui K M.A New Method for Preliminary Identification of Gene Regulatory Networks from Gene Microarray Cancer Data Using Ridge Partial Least Squares with Recursive Feature Elimination and Novel Brier and Occurrence Probability Measures[J].IEEE Transactions on Systems,Man and Cybernetics,Part A:Systems and Humans,2012,42(6):1514-1528.
  • 7张靖,胡学钢,李培培,张玉红.基于迭代Lasso的肿瘤分类信息基因选择方法研究[J].模式识别与人工智能,2014,27(1):49-59. 被引量:18
  • 8史建军,缪裕青.微阵列数据中Top-k频繁闭合项集挖掘[J].计算机工程,2011,37(2):60-62. 被引量:1
  • 9宋佳,许力,孙洪.基于图论的DNA微阵列数据聚类算法[J].计算机工程,2014,40(5):36-40. 被引量:1
  • 10Lee C P,Leu Y.A Novel Hybrid Feature Selection Method for Microarray Data Analysis[J].Application Software Computing,2011,11(1):208-213.

二级参考文献136

共引文献52

同被引文献8

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部