摘要
利用对某网络公司的客户数据使用K均值聚类方法,针对数据中存在的缺失和变量共线性严重的情况,对数据进行预处理。通过选择出符合分类目的的变量得出可用于聚类的完整数据集。然后通过建模对客户进行分类,利用已人为分类的100个样本为训练集。
In this paper,we mainly study the customer classification method.By using the k-mean clustering method for the customer data of a network company,the data was preprocessed for the missing data and serious colinearity of variables.A complete data set that can be used for clustering is obtained by selecting variables that meet the classification purpose.The model is then used to classify the customers,and 100 samples that have been artificially classified are used as training sets.Based on the above research,I think the k-means clustering method has high accuracy for the company’s customer structure.It has a good guiding significance for the future classification of customers.
作者
朱桂玲
ZHU Gui-ling(School of Machematics and Statistics,Zhaotong University,Zhaotong 657000,China)
出处
《昭通学院学报》
2019年第5期12-16,共5页
Journal of Zhaotong University
基金
云南省教育厅科研项目“缺失数据下高维数据的假设检验”(2016FB005)
关键词
客户分类
K均值聚类
变量缺失
共线性
Customer classification
K-means clustering
Missing variables
A total of linear