摘要
目的:探讨带先验知识的支持向量机(P-SVM)数据挖掘算法在中医证候信息自动分类中的应用。方法:以中医证候数据库收集的30余万条中医证候文献信息作为训练和测试数据集,以中医专业知识作为先验知识,将样本集置信度通过带权分类间隔导入SVM模型中进行分类,计算其分类置信度。结果:在有中医专业知识的情况下,中医证候信息分类的正确率得到了很大的提高,正确率约为95%。结论:P-SVM算法是统计学习理论在小样本数据集中较成功的应用,能对中医证候信息进行有效的分类,实现了数据挖掘技术在中医证候信息研究中的应用。实验表明P-SVM算法能把先验知识与训练样本中的信息量很好地结合起来,对一种对中医证候信息进行正确分类的有效算法。
The paper explores possible applications of Prior knowledge Support Vector Machine ( P - SVM ) based data mining algorithm in an automatic TCM syndrome classification system. In the study, a TCM syndrome database containing some 300,000 medical records is used as a sample set for algorithm training and test. In addition, a range of TCM syndrome theories are incorporated into a prior knowledge set. The sample set is made part of the SVM model, with weighted sequence for classification. The confidence value for each result is also calculated on an individualized basis. It is proved that with prior TCM knowledge, the accuracy of the automatic TCM syndrome classification system can be raised to a level as high as 95%. It is concluded that P -SVM has made a successful application of statistical learning theory (SLT) to the given samples, though limited in number, which heralds an effective approach for improving automatic TCM syndrome classification, and proves the applicable features of data mining in TCM syndrome researches. The results show that P - SVM marries prior knowledge with the trained samples, and it is an effective algorithm for TCM syndrome classification.
出处
《世界科学技术-中医药现代化》
2007年第1期28-31,共4页
Modernization of Traditional Chinese Medicine and Materia Medica-World Science and Technology
基金
广东省中医药局资助课题(1040014):中医证候专题数据库的开发与利用
国家科技部"十五"攻关计划课题(2004BA721A02)急性缺血中风辩证规范和疗效评价的示范研究
关键词
中医证候
数据挖掘
信息技术
支持向量机
TCM Syndrome
data mining
information technology
support vector machine