摘要
甲状腺疾病是内分泌领域的常见疾病,准确识别不同类型的甲状腺疾病是临床医疗诊断中的首要问题。针对甲状腺检测指标数据,提出一种新的甲状腺疾病分类方法,该方法首先采用主成分分析法对数据集进行特征选择,降低数据维度,然后基于旋转森林集成分类算法实现分类。旋转森林算法使基分类器的差异性更加明显,进而提高分类器的精度,同时可以减少处理时间。实验中,同时分析了UCI标准数据集和真实临床医疗数据集,结果表明该方法的分类准确率分别可以达到96.28%和96.37%。
Thyroid disease is common in the field of endocrine,accurate identification of different types of thyroid disease is the primary problem of clinical treatment. By using the results of clinical experiments,this paper presents a new method for thyroid disease classification. The method uses principal component analysis to reduce data dimension,and then implements classification task based on rotation forest algorithm. Rotation forest algorithm can make the difference between the base classifiers more obvious,and then improve the accuracy of the classifier,and it can reduce the processing time at the same time. Experimental results show that the classification accuracy of this method can reach to 96. 28% on the dataset from UCI machine learning repository. In order to verify the effectiveness of the method furthermore,this paper also chooses the real clinical medical data set,it is more complex than the UCI standard dataset in data quantity and data dimension. Compared with the other method,the classification accuracy of this method reaches to 96. 37%.
出处
《计算机与现代化》
2016年第3期11-15,共5页
Computer and Modernization
基金
上海市自然科学基金资助项目(15ZR1400900)
上海市科委科技创新行动计划项目(13511504905)
关键词
甲状腺疾病
集成分类
旋转森林
特征选择
主成分分析
thyroid disease
ensemble classification
rotation forest
feature selection
principal component analysis