摘要
在生物信息学上,挖掘差异共表达双聚类有助于研究衰老、癌变类变化的生物过程。以往的差异共表达双聚类定义仅仅从一组基因的角度来衡量差异,导致包含了很多噪声。为了克服上述缺点提出新的差异共表达支持度MiSupport,可以将一组基因的差异细化到基因级别;并由此定义提出MiCluster算法,可以在两个真实的基因芯片数据中挖掘最大的差异共表达双聚类。MiCluster算法首先基于两个基因芯片数据构建差异共表达权值图,然后基于权值图,采用样本扩展和层次扩展,并利用精确的候选产生方法和高效的剪枝策略,挖掘出最大的差异共表达双聚类。实验结果证明,MiCluster算法比现有的算法快速高效,而且通过均方误差(MSE)测试和基因本体(GO)评价,挖掘出来结果具有更大的统计意义和生物学意义。
Bioinformaticly,it is useful to study the change process of biology,such as aging and canceration,by mining differential co-expression bicluster.The definition in the past only measured from the perspective of all set of genes,thus containing a lot of noise.Therefore,a new definition named MiSupport was put forward to measure the difference on gene level,and on the basis of MiSupport,an algorithm named MiCluster was proposed to mine the maximal differential coexpression bicluster in two real gene chips.Firstly,MiCluster constructed a differential weighted undirected sample-sample relational graph in two real-valued gene expression datasets.Secondly,the maximal differential biclusters was produced in the above differential weighted undirected sample-sample relational graph with efficiently pruning techniques and accurately generating candidates method by sample-growth and level-growth.The experimental results show that MiCluster is more efficient than the existing methods.Furthermore,the performance is evaluated by Mean Square Error(MSE) score and Gene Ontology(GO).The results show that this algorithm can find better statistical and biological significance.
出处
《计算机应用》
CSCD
北大核心
2013年第8期2188-2193,2239,共7页
journal of Computer Applications
基金
国家973计划项目(2012CB316203)
国家自然科学基金资助项目(61272121)
关键词
基因芯片
基因共表达
双聚类
差异
行常量
gene chip
gene co-expression
bicluster
differential
constant row