摘要
旨在研究肿瘤电子病历数据挖掘技术,重点探究数据抽取及挖掘分析实验。数据抽取是对文本信息进行针对性抽取,以结构化的形式将结果储存起来,从而为分类算法的研究奠定数据基础。重点研究了肿瘤电子病历的中文分词及分类挖掘算法的选取,对于中文分词的研究,提出了改进后的逆向最大匹配算法,提高了分词准确度和分词效率。对于分类挖掘算法的研究,采用分类效果较好的C4.5算法和BP神经网络算法分别进行分类挖掘实验,通过对分类算法的性能对比,在研究肿瘤电子病历的分类挖掘上,C4.5算法更有利于辅助医生进行肿瘤疾病诊断,提高疾病诊断的精确率及效率进而提高肿瘤患者的治愈率。
The aim is to study the data mining technology of electronic medical records of tumor,especially the data extraction and mining analysis experiments.The data extraction carries out the targeted extraction of the text information and stores the results in a structured form,so as to lay a data foundation for the research of classification algorithms.The Chinese word segmentation of tumor electronic medical records and the selection of classification mining algorithms are studied.For the Chinese word segmentation,an improved inverse maximum matching algorithm is proposed to improve the segmentation accuracy and word segmentation efficiency.For the classification mining algorithm,the classification mining experiment is carried out by C4.5 algorithm and BP neural network algorithm with better classification effect.Through the comparison of the performance of the classification algorithm,in the classification mining of tumor electronic medical records,the C4.5 algorithm is more conducive to assisting doctors in the diagnosis of tumor diseases,improving the accuracy and efficiency of disease diagnosis and improving the cure rate of cancer patients.
作者
童刚
姜宁
刘焕
TONG Gang;JIANG Ning;LIU Huan(School of Information Science and Technology,Qingdao University of Science and Technology,Qingdao 266061,China)
出处
《计算机技术与发展》
2020年第8期152-156,共5页
Computer Technology and Development
基金
国家自然科学基金(61572268)。
关键词
肿瘤电子病历
辅助诊断
逆向最大匹配分词
C4.5
神经网络
electronic medical record of cancer
assisted diagnosis
reverse maximum matching segmentation
C4.5
neural network