摘要
基因表达分析中的微阵列数据具有高维、高冗余的特点,给基因表达数据分类带来很大的困难。机器学习中的最小二乘支持向量机算法具有计算效率高的优势,从而为数据挖掘提供了一条有效途径。针对两类典型的癌症微阵列数据集(结肠癌集和白血病集),进行归一化预处理并且计算其相关系数矩阵;使用主成分分析法进行降维处理,得到用于特征选取和分类的信息基因集(各取 10个基因);采用最小二乘支持向量机分类器对信息基因集进行分类。实验结果表明,该算法在两类癌症数据集上的留一交叉检验的准确率分别为97.5%和100%,具有比其他分类器都高的测试准确率,为进一步医学临床应用提供可靠的诊断依据。
Microarray data in gene expression analysis is characterized by high dimensionality and redundancy,which makes it difficult to classify gene expression data.The least-squares support vector machine(LS-SVM) algorithm in machine learning has the advantage of high computational efficiency,which provides an effective way for data mining.For two types of typical cancer microarray data sets(colon cancer set and leukemia set),we normalized the data and calculated the correlation coefficient matrix.The dimensionality reduction was carried out by principal component analysis,and the information gene sets(10 genes each) for feature selection and classification were obtained.Then,we used LS-SVM classifier to classify information gene sets.The experimental results show that the accuracy of this algorithm is 97.5% and 100% respectively,which is higher than other classifiers.It provides reliable diagnostic basis for further clinical application.
作者
高振斌
Gao Zhenbin(Institute of Mathematics and Applied Mathematics,School of Statistics,Xi’an University of Finance and Economics,Xi’an 710100,Shaanxi,China)
出处
《计算机应用与软件》
北大核心
2019年第8期288-292,共5页
Computer Applications and Software
关键词
微阵列
特征分类
降维
最小二乘支持向量机
Microarray
Feature classification
Reducing dimension
Least-square support vector machine(LS-SVM)