摘要
支持向量机是一种基于结构风险最小化原理的分类技术,已逐渐引起国内外研究者的关注。提出了一种用于最佳特征子集选取的特征筛选算法,且实现了特征与分类识别相关性强度的排序,并通过使用该算法对Ⅱ型糖尿病判别与风险因素筛选,求证了该方法的可靠性和可行性。当以该算法提取的特征子集{腰围、腰围/臀围、舒张血压、年龄}作为输入向量时,敏感度、特异性、准确率最高,分别为0.8666、0.6420、0.7014。同时,还将该算法与主成分分析法进行比较。实验表明,在特征提取方面该算法优于主成分分析法。因此,该算法对分类识别、风险因素筛选是一种有效的方法,为解决该类问题探索了一条有效途径。
Support Vector Machine(SVM),a kind of machine learning method,can efficiently solve the classification problem.A new classification-based feature selection algorithm is developed in this study.This algorithm is able to explore the best subset of features for classification from a group of either irrelevant or relevant features.Moreover,it can systematically prioritize all features based on degree of correlation between them and categories.And it finally is used to identify a set of combined-risk factors for type II diabetes in this study.A best subset of risk factors,consisting of waistline,waistline/hip-girth,diastolic blood pressure and age,is found for this disease.The sensitivity,specificity and accuracy of SVM classification under this subset are 0.866 6,0.642 0 and 0.701 4 respectively.In addition,a comparison between this algorithm and principal component analysis is also conducted.It turns out that the former is superior to the latter for the extraction of features.
出处
《计算机工程与应用》
CSCD
北大核心
2007年第20期210-213,共4页
Computer Engineering and Applications
基金
国家自然科学基金(the National Natural Science Foundation of China under Grant No.60473031)