期刊文献+
共找到67篇文章
< 1 2 4 >
每页显示 20 50 100
Polarimetric Meteorological Satellite Data Processing Software Classification Based on Principal Component Analysis and Improved K-Means Algorithm 被引量:1
1
作者 Manyun Lin Xiangang Zhao +3 位作者 Cunqun Fan Lizi Xie Lan Wei Peng Guo 《Journal of Geoscience and Environment Protection》 2017年第7期39-48,共10页
With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In th... With the increasing variety of application software of meteorological satellite ground system, how to provide reasonable hardware resources and improve the efficiency of software is paid more and more attention. In this paper, a set of software classification method based on software operating characteristics is proposed. The method uses software run-time resource consumption to describe the software running characteristics. Firstly, principal component analysis (PCA) is used to reduce the dimension of software running feature data and to interpret software characteristic information. Then the modified K-means algorithm was used to classify the meteorological data processing software. Finally, it combined with the results of principal component analysis to explain the significance of various types of integrated software operating characteristics. And it is used as the basis for optimizing the allocation of software hardware resources and improving the efficiency of software operation. 展开更多
关键词 Principal COMPONENT ANALYSIS improved k-mean algorithm METEOROLOGICAL data Processing FEATURE ANALYSIS SIMILARITY algorithm
下载PDF
A State of Art Analysis of Telecommunication Data by k-Means and k-Medoids Clustering Algorithms
2
作者 T. Velmurugan 《Journal of Computer and Communications》 2018年第1期190-202,共13页
Cluster analysis is one of the major data analysis methods widely used for many practical applications in emerging areas of data mining. A good clustering method will produce high quality clusters with high intra-clus... Cluster analysis is one of the major data analysis methods widely used for many practical applications in emerging areas of data mining. A good clustering method will produce high quality clusters with high intra-cluster similarity and low inter-cluster similarity. Clustering techniques are applied in different domains to predict future trends of available data and its uses for the real world. This research work is carried out to find the performance of two of the most delegated, partition based clustering algorithms namely k-Means and k-Medoids. A state of art analysis of these two algorithms is implemented and performance is analyzed based on their clustering result quality by means of its execution time and other components. Telecommunication data is the source data for this analysis. The connection oriented broadband data is given as input to find the clustering quality of the algorithms. Distance between the server locations and their connection is considered for clustering. Execution time for each algorithm is analyzed and the results are compared with one another. Results found in comparison study are satisfactory for the chosen application. 展开更多
关键词 k-means algorithm k-Medoids algorithm data clustering Time COMPLEXITY TELECOMMUNICATION data
下载PDF
An efficient enhanced k-means clustering algorithm 被引量:30
3
作者 FAHIM A.M SALEM A.M +1 位作者 TORKEY F.A RAMADAN M.A 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2006年第10期1626-1633,共8页
In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared dista... In k-means clustering, we are given a set of n data points in d-dimensional space R^d and an integer k and the problem is to determine a set of k points in R^d, called centers, so as to minimize the mean squared distance from each data point to its nearest center. In this paper, we present a simple and efficient clustering algorithm based on the k-means algorithm, which we call enhanced k-means algorithm. This algorithm is easy to implement, requiring a simple data structure to keep some information in each iteration to be used in the next iteration. Our experimental results demonstrated that our scheme can improve the computational speed of the k-means algorithm by the magnitude in the total number of distance calculations and the overall time of computation. 展开更多
关键词 clustering algorithms cluster analysis k-means algorithm data analysis
下载PDF
Anomaly Detection of Store Cash Register Data Based on Improved LOF Algorithm 被引量:3
4
作者 Ke Long Yuhang Wu Yufeng Gui 《Applied Mathematics》 2018年第6期719-729,共11页
As the cash register system gradually prevailed in shopping malls, detecting the abnormal status of the cash register system has gradually become a hotspot issue. This paper analyzes the transaction data of a shopping... As the cash register system gradually prevailed in shopping malls, detecting the abnormal status of the cash register system has gradually become a hotspot issue. This paper analyzes the transaction data of a shopping mall. When calculating the degree of data difference, the coefficient of variation is used as the attribute weight;the weighted Euclidean distance is used to calculate the degree of difference;and k-means clustering is used to classify different time periods. It applies the LOF algorithm to detect the outlier degree of transaction data at each time period, sets the initial threshold to detect outliers, deletes the outliers, and then performs SAX detection on the data set. If it does not pass the test, then it will gradually expand the outlying domain and repeat the above process to optimize the outlier threshold to improve the sensitivity of detection algorithm and reduce false positives. 展开更多
关键词 CASH REGISTER data ANOMALY Detection k-means clustering Optimized LOF algorithm SAX Test
下载PDF
基于改进K-means数据聚类算法的网络入侵检测 被引量:3
5
作者 黄俊萍 《成都工业学院学报》 2024年第2期58-62,97,共6页
随着入侵手段的不断更新和升级,传统入侵检测方法准确率下降、检测时间延长,无法满足网络防御要求。为此,提出一种经过改进K均值(K-means)数据聚类算法,以应对不断升级的网络入侵行为。先以防火墙日志为基础转换数值,然后基于粒子群算... 随着入侵手段的不断更新和升级,传统入侵检测方法准确率下降、检测时间延长,无法满足网络防御要求。为此,提出一种经过改进K均值(K-means)数据聚类算法,以应对不断升级的网络入侵行为。先以防火墙日志为基础转换数值,然后基于粒子群算法求取最优初始聚类中心,实现K-means数据聚类算法的改进;最后以计算得出的特征值为输入项,实现对网络入侵行为的精准检测。结果表明:K-means算法改进后较改进前的戴维森堡丁指数更小,均低于0.6,达到了改进目的。改进K-means算法各样本的准确率均高于90%,相对更高,检测时间均低于10 s,相对更少,说明该方法能够以高效率完成更准确的网络入侵检测。 展开更多
关键词 改进k-means数据聚类算法 防火墙日志 入侵检测特征 粒子群算法 网络入侵检测
下载PDF
基于改进k-means算法的电力负荷数据聚类方法
6
作者 吕相沅 陈安琪 +1 位作者 刘青 程昱舒 《电子设计工程》 2024年第20期121-124,129,共5页
针对现有数据聚类方法难以对电力系统负荷数据进行有效聚类的问题,该文结合改进k-means算法,完成电力负荷数据聚类方法设计。该研究基于电力负荷数据中心点生成过程,构建中心点间距与类簇距离判定函数,筛选电力负荷数据聚类中心。确定... 针对现有数据聚类方法难以对电力系统负荷数据进行有效聚类的问题,该文结合改进k-means算法,完成电力负荷数据聚类方法设计。该研究基于电力负荷数据中心点生成过程,构建中心点间距与类簇距离判定函数,筛选电力负荷数据聚类中心。确定聚类中心后,采用数据分离方法完成正常负荷数据和异常负荷数据的分离,在分离过程中应保证数据连续,以避免潜在有用数据丢失。利用改进的k-means算法分析电力负荷数据,计算不同种类数据间的欧氏距离。设定指针矩阵,融合不同类中心点,对原始数据区间规范化操作,获取不同簇的负荷数据聚类通道传输功率谱密度。将数据依次分配到不同簇上,实现电力负荷数据聚类。由实验结果可知,该方法站点1数据聚类范围为0.3~0.48 pu,站点2数据聚类范围为0.34~0.47 pu,优于对比方法,与理想聚类范围最贴近,具有良好的聚类效果。 展开更多
关键词 改进k-means算法 电力负荷 数据聚类 区间规范化操作
下载PDF
Distance function selection in several clustering algorithms
7
作者 LUYu 《Journal of Chongqing University》 CAS 2004年第1期47-50,共4页
Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical... Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated. Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the 搒wing points? And such points may leak out more information for data analysts. 展开更多
关键词 distance function clustering algorithms k-means DENDROGRAM data mining
下载PDF
对k-means初始聚类中心的优化 被引量:29
8
作者 仝雪姣 孟凡荣 王志晓 《计算机工程与设计》 CSCD 北大核心 2011年第8期2721-2723,2788,共4页
针对传统k-means算法对初始聚类中心敏感的问题,提出了基于数据样本分布选取初始聚类中心的改进k-means算法。该算法利用贪心思想构建K个数据集合,集合的大小与数据的实际分布密切相关,集合中的数据彼此间相互靠近。取集合中数据的平均... 针对传统k-means算法对初始聚类中心敏感的问题,提出了基于数据样本分布选取初始聚类中心的改进k-means算法。该算法利用贪心思想构建K个数据集合,集合的大小与数据的实际分布密切相关,集合中的数据彼此间相互靠近。取集合中数据的平均值作为初始聚类中心,由此得到的初始聚类中心非常接近迭代聚类算法期待的聚类中心。理论分析和实验结果表明,改进算法能改善其聚类性能,并能得到稳定的聚类结果,取得较高的分类准确率。 展开更多
关键词 聚类 k-means算法 数据分布 初始聚类中心 改进算法
下载PDF
改进的k-means聚类算法在客户细分中的应用研究 被引量:8
9
作者 杜巍 赵春荣 黄伟建 《河北经贸大学学报》 CSSCI 北大核心 2014年第1期118-121,共4页
聚类分析是数据挖掘的一种重要方法,将它应用在客户细分中,可以识别出不同的客户群,从而针对不同的客户群制定相应的营销政策,使企业效益最大化。针对聚类分析中k-means算法的不足,运用改进的聚类算法对旅游业客户进行细分,从而使企业... 聚类分析是数据挖掘的一种重要方法,将它应用在客户细分中,可以识别出不同的客户群,从而针对不同的客户群制定相应的营销政策,使企业效益最大化。针对聚类分析中k-means算法的不足,运用改进的聚类算法对旅游业客户进行细分,从而使企业能够更合理地细分、规划客户群组,针对不同需求的客户群体进行区别对待,得到了较好的效果,验证了改进算法的可行性和高效性。 展开更多
关键词 聚类分析 客户细分 数据挖掘 改进的k—means算法 客户群
下载PDF
K-means聚类算法中聚类个数的方法研究 被引量:19
10
作者 刘飞 唐雅娟 刘瑶 《电子设计工程》 2017年第15期9-13,共5页
在数据挖掘算法中,K均值聚类算法是一种比较常见的无监督学习方法,簇间数据对象越相异,簇内数据对象越相似,说明该聚类效果越好。然而,簇个数的选取通常是由有经验的用户预先进行设定的参数。本文提出了一种能够自动确定聚类个数,采用SS... 在数据挖掘算法中,K均值聚类算法是一种比较常见的无监督学习方法,簇间数据对象越相异,簇内数据对象越相似,说明该聚类效果越好。然而,簇个数的选取通常是由有经验的用户预先进行设定的参数。本文提出了一种能够自动确定聚类个数,采用SSE和簇的个数进行度量,提出了一种聚类个数自适应的聚类方法(简称:SKKM)。通过UCI数据和仿真数据对象的实验,对SKKM算法进行了验证,实验结果表明改进的算法可以快速的找到数据对象中聚类个数,提高了算法的性能。 展开更多
关键词 k-means算法 聚类个数 初始聚类中心 数据挖掘 k-means算法改进
下载PDF
基于划分的数据挖掘K-means聚类算法分析 被引量:19
11
作者 曾俊 《现代电子技术》 北大核心 2020年第3期14-17,共4页
为提升数据挖掘中聚类分析的效果,在分析数据挖掘、聚类分析、传统K⁃means算法的基础上,提出一种改进的K⁃means算法。首先将整体数据集分为k类,然后设定一个密度参数为ϑ,该密度参数反映数据库中数据所处区域的密度大小,ϑ值与密度大小成... 为提升数据挖掘中聚类分析的效果,在分析数据挖掘、聚类分析、传统K⁃means算法的基础上,提出一种改进的K⁃means算法。首先将整体数据集分为k类,然后设定一个密度参数为ϑ,该密度参数反映数据库中数据所处区域的密度大小,ϑ值与密度大小成正比,通过密度参数优化k个样本数据的聚类中心点选取;依据欧几里得距离公式对未选取的其他数据到各个聚类中心之间的距离进行计算,同时以此距离为判别标准,对各个数据进行种类划分,从而得到初始的聚类分布;初始聚类分布得到之后,对每一个分布簇进行再一次的中心点计算,并判断与之前所取中心点是否相同,直到其聚类收敛达到最优效果。最后通过葡萄酒数据集对改进算法进行验证分析,改进算法比传统K⁃means算法的聚类效果更优,能够更好地在数据挖掘当中进行聚类。 展开更多
关键词 数据挖掘 聚类分析 K⁃means聚类算法 聚类中心选取 K⁃means算法改进 初始中心点
下载PDF
一种基于密度的增量k-means聚类算法研究 被引量:5
12
作者 司福明 《长春工程学院学报(自然科学版)》 2016年第2期99-102,共4页
介绍了k-means和DBSCAN聚类算法的基本原理和优缺点,针对传统聚类算法无法有效处理高维混合属性数据集的问题,对原有的数据归一化方法进行改进,在k-means和DBSCAN聚类算法的基础之上,结合增量聚类的思想和数据之间相异度的计算方法,提... 介绍了k-means和DBSCAN聚类算法的基本原理和优缺点,针对传统聚类算法无法有效处理高维混合属性数据集的问题,对原有的数据归一化方法进行改进,在k-means和DBSCAN聚类算法的基础之上,结合增量聚类的思想和数据之间相异度的计算方法,提出了基于密度的增量k-means聚类算法,有效处理具有高维混合属性的数据集,改进了数据相异度的计算方法。 展开更多
关键词 k-means聚类算法 改进 数据相异度
下载PDF
Improvement and Parallelism of k-Means Clustering Algorithm 被引量:2
13
作者 田金兰 朱林 +1 位作者 张素琴 刘璐 《Tsinghua Science and Technology》 SCIE EI CAS 2005年第3期277-281,共5页
The k-means clustering algorithm is one of the most commonly used algorithms for clustering analysis. The traditional k-means algorithm is, however, inefficient while working on large numbers of data sets and improvin... The k-means clustering algorithm is one of the most commonly used algorithms for clustering analysis. The traditional k-means algorithm is, however, inefficient while working on large numbers of data sets and improving the algorithm efficiency remains a problem. This paper focuses on the efficiency issues of cluster algorithms. A refined initial cluster centers method is designed to reduce the number of iterative procedures in the algorithm. A parallel k-means algorithm is also studied for the problem of the operation limitation of a single processor machine when given huge data sets. The analytical results demonstrate that these improvements can greatly enhance the efficiency of the k-means algorithm, i.e., allow the grouping of a large number of data sets more accurately and more quickly. The analysis has theoretical and practical importance for work on the improvement and parallelism of cluster algorithms. 展开更多
关键词 data mining cluster analysis k-means algorithm PARALLELISM
原文传递
A Tradeoff Between Accuracy and Speed for K-Means Seed Determination
14
作者 Farzaneh Khorasani Morteza Mohammadi Zanjireh +1 位作者 Mahdi Bahaghighat Qin Xin 《Computer Systems Science & Engineering》 SCIE EI 2022年第3期1085-1098,共14页
With a sharp increase in the information volume,analyzing and retrieving this vast data volume is much more essential than ever.One of the main techniques that would be beneficial in this regard is called the Clusteri... With a sharp increase in the information volume,analyzing and retrieving this vast data volume is much more essential than ever.One of the main techniques that would be beneficial in this regard is called the Clustering method.Clustering aims to classify objects so that all objects within a cluster have similar features while other objects in different clusters are as distinct as possible.One of the most widely used clustering algorithms with the well and approved performance in different applications is the k-means algorithm.The main problem of the k-means algorithm is its performance which can be directly affected by the selection in the primary clusters.Lack of attention to this crucial issue has consequences such as creating empty clusters and decreasing the convergence time.Besides,the selection of appropriate initial seeds can reduce the cluster’s inconsistency.In this paper,we present a new method to determine the initial seeds of the k-mean algorithm to improve the accuracy and decrease the number of iterations of the algorithm.For this purpose,a new method is proposed considering the average distance between objects to determine the initial seeds.Our method attempts to provide a proper tradeoff between the accuracy and speed of the clustering algorithm.The experimental results showed that our proposed approach outperforms the Chithra with 1.7%and 2.1%in terms of clustering accuracy for Wine and Abalone detection data,respectively.Furthermore,achieved results indicate that comparing with the Reverse Nearest Neighbor(RNN)search approach,the proposed method has a higher convergence speed. 展开更多
关键词 data clustering k-means algorithm information retrieval outlier detection clustering accuracy unsupervised learning
下载PDF
改进K-means算法下大数据精准挖掘 被引量:2
15
作者 蔡瑞瑞 《新乡学院学报》 2021年第3期27-31,共5页
针对传统数据挖掘过程中聚类结果波动较大、聚类纯度低的问题,提出了基于改进K-means算法的大数据精准挖掘技术。先将提取到的数据模型转换为数学语言,采用自动编码器优化数据特征,再计算数据集的相似程度,然后选择度量公式,指定聚类数... 针对传统数据挖掘过程中聚类结果波动较大、聚类纯度低的问题,提出了基于改进K-means算法的大数据精准挖掘技术。先将提取到的数据模型转换为数学语言,采用自动编码器优化数据特征,再计算数据集的相似程度,然后选择度量公式,指定聚类数量,经多次计算得出最优解。利用改进K-means算法,获取数据集中局部密度值最大的点作为聚类中心点。计算出数据样本的欧氏距离后,经过多次迭代得到聚类结果。比较改进K-means算法与3种传统算法在数据挖掘中的应用效果。实验结果表明,改进K-means算法的结果曲线波动幅度小,聚类纯度明显高于传统算法。 展开更多
关键词 改进k-means算法 聚类结果 聚类挖掘 大数据 自动编码器 K-均值
下载PDF
Clustering: from Clusters to Knowledge
16
作者 Peter Grabusts 《Computer Technology and Application》 2013年第6期284-290,共7页
Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities... Data analysis and automatic processing is often interpreted as knowledge acquisition. In many cases it is necessary to somehow classify data or find regularities in them. Results obtained in the search of regularities in intelligent data analyzing applications are mostly represented with the help of IF-THEN rules. With the help of these rules the following tasks are solved: prediction, classification, pattern recognition and others. Using different approaches---clustering algorithms, neural network methods, fuzzy rule processing methods--we can extract rules that in an understandable language characterize the data. This allows interpreting the data, finding relationships in the data and extracting new rules that characterize them. Knowledge acquisition in this paper is defined as the process of extracting knowledge from numerical data in the form of rules. Extraction of rules in this context is based on clustering methods K-means and fuzzy C-means. With the assistance of K-means, clustering algorithm rules are derived from trained neural networks. Fuzzy C-means is used in fuzzy rule based design method. Rule extraction methodology is demonstrated in the Fisher's Iris flower data set samples. The effectiveness of the extracted rules is evaluated. Clustering and rule extraction methodology can be widely used in evaluating and analyzing various economic and financial processes. 展开更多
关键词 data analysis clustering algorithms k-means fuzzy C-means rule extraction.
下载PDF
基于改进聚类算法的风力发电功率-风速数据处理方法的研究
17
作者 胡青文 徐雪璐 +1 位作者 周勃 王义娜 《辽宁科技学院学报》 2025年第1期30-33,93,共5页
针对风功率-风速异常数据难以有效清洗的问题,以实际风场SCADA数据为例,提出了一种改进的基于DBSCAN聚类算法的风功率数据清洗方法。首先,根据风功率数据的分布特征进行数据分类,在清洗前进行数据标准化处理,减小数据大小对聚类的影响;... 针对风功率-风速异常数据难以有效清洗的问题,以实际风场SCADA数据为例,提出了一种改进的基于DBSCAN聚类算法的风功率数据清洗方法。首先,根据风功率数据的分布特征进行数据分类,在清洗前进行数据标准化处理,减小数据大小对聚类的影响;然后,采用斜率突变法,利用计算的k距离确定DBSCAN算法的Eps取值,并完成聚类;最后,分别计算风功率数据清洗前后的相关系数来验证聚类结果,评估数据清洗质量。结果表明:采用改进方法进行数据清洗后,风功率数据相关系数增加了0.234,相关系数明显提高,尤其对于异常数据离散分布情况的清洗效果更佳,解决了传统DBSCAN聚类算法阈值选择的难题,该方法对于不同风场条件的异常功率数据清洗具有一定的普适性。 展开更多
关键词 风力机 风功率数据 数据清洗 聚类算法 改进DBSCAN方法
下载PDF
Improved algorithm of cluster-based routing protocols for agricultural wireless multimedia sensor networks
18
作者 Zhang Fu Liu Hongmei +3 位作者 Wang Jun Qiu Zhaomei Mao Pengjun Zhang Yakun 《International Journal of Agricultural and Biological Engineering》 SCIE EI CAS 2016年第4期132-140,共9页
Low Energy Adaptive Clustering Hierarchy(LEACH)is a routing algorithm in agricultural wireless multimedia sensor networks(WMSNs)that includes two kinds of improved protocol,LEACH_D and LEACH_E.In this study,obstacles ... Low Energy Adaptive Clustering Hierarchy(LEACH)is a routing algorithm in agricultural wireless multimedia sensor networks(WMSNs)that includes two kinds of improved protocol,LEACH_D and LEACH_E.In this study,obstacles were overcome in widely used protocols.An improved algorithm was proposed to solve existing problems,such as energy source restriction,communication distance,and energy of the nodes.The optimal number of clusters was calculated by the first-order radio model of the improved algorithm to determine the percentage of the cluster heads in the network.High energy and the near sink nodes were chosen as cluster heads based on the residual energy of the nodes and the distance between the nodes to the sink node.At the same time,the K-means clustering analysis method was used for equally assigning the nodes to several clusters in the network.Both simulation and the verification results showed that the survival number of the proposed algorithm LEACH-ED increased by 66%.Moreover,the network load was high and network lifetime was longer.The mathematical model between the average voltage of nodes(y)and the running time(x)was concluded in the equation y=−0.0643x+4.3694,and the correlation coefficient was R2=0.9977.The research results can provide a foundation and method for the design and simulation of the routing algorithm in agricultural WMSNs. 展开更多
关键词 wireless sensor networks routing protocol LEACH algorithm improved algorithm cluster head k-means clustering
原文传递
A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA
19
作者 Ohn Mar San Van-Nam Huynh Yoshiteru Nakamori 《Journal of Systems Science & Complexity》 SCIE EI CSCD 2003年第4期562-571,共10页
Most of the earlier work on clustering mainly focused on numeric data whoseinherent geometric properties can be exploited to naturally define distance functions between datapoints. However, data mining applications fr... Most of the earlier work on clustering mainly focused on numeric data whoseinherent geometric properties can be exploited to naturally define distance functions between datapoints. However, data mining applications frequently involve many datasets that also consists ofmixed numeric and categorical attributes. In this paper we present a clustering algorithm which isbased on the k-means algorithm. The algorithm clusters objects with numeric and categoricalattributes in a way similar to k-means. The object similarity measure is derived from both numericand categorical attributes. When applied to numeric data, the algorithm is identical to the k-means.The main result of this paper is to provide a method to update the 'cluster centers' of clusteringobjects described by mixed numeric and categorical attributes in the clustering process to minimizethe clustering cost function. The clustering performance of the algorithm is demonstrated with thetwo well known data sets, namely credit approval and abalone databases. 展开更多
关键词 cluster analysis numeric data categorical data k-means algorithm
原文传递
Study on the Grouping of Patients with Chronic Infectious Diseases Based on Data Mining
20
作者 Min Li 《Journal of Biosciences and Medicines》 2019年第11期119-135,共17页
Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the mana... Objective: According to RFM model theory of customer relationship management, data mining technology was used to group the chronic infectious disease patients to explore the effect of customer segmentation on the management of patients with different characteristics. Methods: 170,246 outpatient data was extracted from the hospital management information system (HIS) during January 2016 to July 2016, 43,448 data was formed after the data cleaning. K-Means clustering algorithm was used to classify patients with chronic infectious diseases, and then C5.0 decision tree algorithm was used to predict the situation of patients with chronic infectious diseases. Results: Male patients accounted for 58.7%, patients living in Shanghai accounted for 85.6%. The average age of patients is 45.88 years old, the high incidence age is 25 to 65 years old. Patients was gathered into three categories: 1) Clusters 1—Important patients (4786 people, 11.72%, R = 2.89, F = 11.72, M = 84,302.95);2) Clustering 2—Major patients (23,103, 53.2%, R = 5.22, F = 3.45, M = 9146.39);3) Cluster 3—Potential patients (15,559 people, 35.8%, R = 19.77, F = 1.55, M = 1739.09). C5.0 decision tree algorithm was used to predict the treatment situation of patients with chronic infectious diseases, the final treatment time (weeks) is an important predictor, the accuracy rate is 99.94% verified by the confusion model. Conclusion: Medical institutions should strengthen the adherence education for patients with chronic infectious diseases, establish the chronic infectious diseases and customer relationship management database, take the initiative to help them improve treatment adherence. Chinese governments at all levels should speed up the construction of hospital information, establish the chronic infectious disease database, strengthen the blocking of mother-to-child transmission, to effectively curb chronic infectious diseases, reduce disease burden and mortality. 展开更多
关键词 data Mining k-means clustering algorithm C5.0 Decision Tree algorithm Customer Relationship Management PATIENTS with CHRONIC INFECTIOUS Disease
下载PDF
上一页 1 2 4 下一页 到第
使用帮助 返回顶部