摘要
面对飞速发展的信息时代,WEB数据的挖掘日益重要,而传统的搜索引擎难以胜任对数据的挖掘处理。基于XML良好的结构性和层次性,提出了利用DOM树进行WEB挖掘的方法。首先利用Tidy工具库将WEB数据转换成良好结构的XML文件,简化生成DOM树,然后通过遍历解析XML的DOM树结构,提取需要的WEB信息,实现对WEB数据挖掘。实验表明,该方法能够方便地对数据进行结构化存储和信息处理。
Facing with the rapidly development of the information age, WEB data mining become increasingly important, and traditional search engines can not do the mining processing of data. So the method that takes advantage of the DOM tree for WEB mining is put forward based on good structure and level of XML. First WEB data is transformed into XML file for good structure by tool library, DOM tree is simplely produced, then the heedell WEB information can be extracted through the traversal and parsing of DOM tree structure of XML to realize the WEB data mining. Experiments show that the method is easy for structured data storage and information processing.
出处
《四川理工学院学报(自然科学版)》
CAS
2013年第3期64-67,共4页
Journal of Sichuan University of Science & Engineering(Natural Science Edition)
基金
四川理工学院研究生创新基金项目(y2012007)