摘要
电力信息化的发展使得电力营销系统中的数据量不断增加,导致在数据抽取过程中的数据转换能力较差,从而造成抽取结果召回率偏高的情况。针对这一情况,利用可扩展标记语言(XML)的转换能力,设计了新的电力营销数据智能抽取方法。将电力营销数据规范为小范围数据链形式,并应用超文本敏感标题搜索(HITS)算法获取数据源。设定XML数据转换工具,利用XML定位描述符实现数据区域定位。在设定数据抽取规则与抽取内容的基础上,结合数据映射技术实现对电力营销数据的抽取。在性能测试过程中,将测试环境设定为平稳运行与数据入侵2种。通过对比结果可知,基于XML的抽取方法的召回率保持在7%以下,抽取耗时保持在800 ms以下,其值优于传统方法,充分证明了该方法的有效性。
The development of power informatization has led to an increasing amount of data in the power marketing system, resulting in poor data conversion capability in the data extraction process, which causes a high recall rate of extraction results. To address this situation, a new intelligent extraction method for power marketing data is designed using the transformation capability of extensible markup language(XML). The power marketing data is standardized into the form of a small range of data chains and the hyperlink-induced topic search(HITS) algorithm is applied to obtain the data sources. XML data conversion tool is set, and XML location descriptors are used to realize data region location. Based on setting data extraction rules and extraction contents, the extraction of electricity marketing data is realized by combining data mapping technology. In the performance testing process, the testing environment is set to both smooth operation and data intrusion. The comparison results show that the recall rate of the XML-based extraction method is kept below 7% and the extraction elapsed time is kept below 800 ms, whose values are better than those of the traditional method, which fully proves the effectiveness of the method.
作者
余向前
YU Xiangqian(State Grid Gansu Electric Power Company,Lanzhou 730030,China)
出处
《自动化仪表》
CAS
2023年第1期92-95,100,共5页
Process Automation Instrumentation
关键词
可扩展标记语言
电力营销数据
信息安全
数据抽取
数据转换
数据区域定位
抽取规则
数据映射
召回率
Extensible markup language(XML)
Power marketing data
Information security
Data extraction
Data transformation
Data region location
Extraction rules
Data mapping
Recall rate