摘要
模糊自然语言处理将模糊理论应用在自然语言处理(NLP)的任务中,随着大模型与人工智能的不断发展,有关文本数据的研究不断深化。铁路调度集中控制(CTC)系统作为大型复杂系统,各子系统、服务器软件间的接口数据均以日志文本格式存储与传输。由于其具有文本数量多、文本类型杂等特点,提出了一种模糊自然语言处理的方法,解决CTC系统接口数据的人工测试难题。模糊C均值(FCM)聚类算法将日志文本分为不同的标签类别,并将其作为NLP任务中命名实体识别的标签输入,在传统BiLSTM-CRF模型上引入BERT进行文本编码,更准确地理解文本之间的关系并提高文本识别的精确度。根据前序训练模型,研发了铁路CTC系统日志文本接口测试的智能验证工具,其可以改善目前CTC系统的人工测试现状,帮助测试人员进行接口测试验证,提升测试工作的智能化、自动化水平。
Fuzzy natural language processing applies fuzzy theoretical knowledge to the task of natural language process‐ing(NLP).With the continuous development of large language model and artificial intelligence,research on text data con‐tinues to deepen.As a large and complex system,the interface data between various subsystems and server software are stored and transmitted in log text format.Due to its large number of texts and miscellaneous text types,a fuzzy NLP method was proposed to solve the problem of manual testing the interface data of centralized traffic control(CTC)sys‐tem.The fuzzy C-means(FCM)clustering algorithm divided the log text into different label categories,which was used as the label input for named entity recognition in NLP tasks,and BERT was introduced on the traditional BiLSTM-CRF model for text encoding,which understood the relationship between texts more accurately and improved the accuracy of text recognition.An intelligent verification tool for log-text interface testing of railway CTC system was presented based on an improved training model,which enhanced the current manual testing process of CTC system,assisted testing staff in verifying the interface testing,and increased the level of intelligence and automation in testing work.
作者
角远韬
李润梅
王剑
JIAO Yuantao;LI Runmei;WANG Jian(School of Automation and Intelligence,Beijing Jiaotong University,Beijing 100044,China)
出处
《智能科学与技术学报》
CSCD
2024年第2期201-209,共9页
Chinese Journal of Intelligent Science and Technology
基金
国家重点研发计划(No.2022YFB4300500)
中国国家铁路集团有限公司科技研究开发计划(No.L2022X002)。
关键词
自然语言处理
模糊文本聚类
铁路调度集中控制系统
命名实体识别
智能测试
natural language processing
fuzzy text clustering
railway centralized traffic control system
named entity recognition
intelligent testing