摘要
文章研究了基于微阵列基因表达数据的胃癌亚型分类。微阵列基因表达数据样本少、纬度高、噪声大的特点,使得数据降维成为分类成功的关键。作者将主成分分析(PCA)和偏最小二乘(PLS)两种降维方法应用于胃癌亚型分类研究,以支持向量机(SVM)、K-近邻法(KNN)为分类器对两套胃癌数据进行亚型分类。分类效果相比传统的医理诊断略高,最高准确率可达100%。研究结果表明,主成分分析和偏最小二乘方法能够有效地提取分类特征信息,并能在保持较高的分类准确率的前提下大幅度地降低基因表达数据的维数。
The gastric cancer is one of the most common malignant tumors in the world. There is no uniform method to classify gastric cancer in medicine until now. Gastric cancer may be the intestinal gastric cancer or diffused gastric cancer based on Lauren. It is important to know the subtype of gastric cancer so that to decide how to treat. Using gene expression data to research cancer is one of the hot research subjects at present, and will have strong impact on gastric cancer treatment and diagnosis. The gene expression profiling, generally has small samples and high dimensions because of the expensive experiments and other reasons. Therefore the traditional methods for classification are always failing. We should cut down dimensions of the data before classification. In this paper, the authors applied the partial least squares (PLS) and the principal component analysis (PCA) to the classification of gastric cancer. Two different data sets of gastric cancer had been used. And the results of classification using these two methods were compared with SVM and KNN. The results of the experiments showed that PLS and PCA were both good as the method for dimension reduction. And the result of classification was also good. The merits and the demerits of the two methods were also expounded in the paper.
出处
《生物物理学报》
CAS
CSCD
北大核心
2009年第2期141-147,共7页
Acta Biophysica Sinica
基金
国家自然科学基金项目(60234020)~~
关键词
主成分分析
偏最小二乘
分类
胃癌
微阵列基因表达数据
Principle component analysis
Partial least squares
Classification
Gastric cancer
Gene expression profiling