摘要
轨道电路故障日志是现场日常运维工作中的重要数据记录。针对轨道电路故障日志在现场工作中未能充分挖掘利用且人工分析效率较低的问题,提出基于谱聚类算法的轨道电路故障文本主题聚类挖掘分析方法。首先,分析轨道电路故障文本数据特征并进行文本预处理,采用Word2vec模型训练获取字符级特征向量,实现在语义空间上的轨道电路故障文本数据特征表示;然后,依据Laplacian矩阵的图谱聚类特性,将高维故障文本特征数据聚类转换为谱图切分问题,分别对电务、工务及供电故障因素文本数据求解规范化后Laplacian矩阵的特征向量,并构建低维故障文本特征矩阵,再通过K-Means聚类算法实现3种故障因素文本数据集下故障文本主题聚类分析,获取电务、工务及供电故障因素文本数据中蕴含的轨道电路故障主题类型及频率信息,并基于t分布随机邻域嵌入算法实现聚类结果的可视化分析;最后,采用不同聚类模型在3种故障因素文本数据集上进行对比实验。实验结果表明:基于谱聚类算法的聚类模型在保证故障文本聚类准确率的情况下,其收敛性能更优;聚类可视化分析结果验证了获取的不同故障主题类别具有较高的语义区分度。通过该方法对轨道电路故障文本数据进行自动化聚类挖掘及统计分析,可为现场轨道电路综合维修及故障预防提供辅助支持。
Track circuit fault log is an important data record in the daily operation and maintenance work on site.Ai-ming at the problem that the track circuit fault log is not fully utilized in the field work and the efficiency of manual analysis is low,a topic clustering analysis method of track circuit fault text based on spectral clustering algorithm was proposed.Firstly,the characteristics of track circuit fault text data were analyzed and text preprocessing was carried out,Word2vec model was used to train and obtain character-level vectors to realize the feature representation of track circuit fault text data in semantic space;Secondly,according to the spectral clustering characteristics of the Laplacian matrix,the high-dimensional fault text feature data clustering was converted into a spectral segmentation problem,for the three fault factors text data,the feature vectors of normalized Laplacian matrix were solved and a low dimensional fault text feature matrix was constructed,then the K-Means clustering algorithm was used to realize the fault text topic clustering analysis under three fault factors text data sets,and the hidden track circuit fault topic type and frequency information contained in the text data of different fault factors was obtained,and the visual analy-sis of the clustering results based on the t-distributed stochastic neighbor embedding algorithm was realized;Finally,comparative experiments were conducted on three fault factor text data sets using different clustering models.The experimental results show that the clustering model based on spectral clustering algorithm had better convergence performance while ensuring the clustering accuracy of fault text clustering;Based on the clustering visualization anal-ysis results,it is verified that the different fault topic categories obtained have high semantic discrimination.Through this method,automated clustering mining and statistical analysis of track circuit fault text data can provide auxiliary support for on-site track circuit comprehensive maintenance and fault prevention.
作者
姚新文
侯通
郑启明
王小敏
YAO Xinwen;HOU Tong;ZHENG Qiming;WANG Xiaomin(School of Information Science and Technology,Southwest Jiaotong University,Chengdu 611756,China;Transportation&Economics Research Institute,China Academy of Railway Sciences Corporation Limited,Beijing 100081,China;Sichuan Province Train Operation Control Technology Engineering Research Center,Chengdu 611756,China)
出处
《兰州交通大学学报》
CAS
2024年第1期64-72,共9页
Journal of Lanzhou Jiaotong University
基金
中国国家铁路集团有限公司科技研发计划项目(L2022G004,P2021G053)。