期刊文献+

基于聚类技术的XML文件代表性结构获取 被引量:4

Representative Structures from XML Documents Based on Clustering Techniques
下载PDF
导出
摘要 XML文件可以利用树状结构来表示,于是把如何将XML文件做聚类看成如何对树状结构的数据作聚类.使用SOM聚类工具搭配上Jaccard的距离测量公式来对XML文件做聚类,然后在每个cluster中利用GST(Graph SearchTechnique)算法从这些XML文件当中找出他们的最大序列,最后将这些最大序列融合起来成为共同的结构. Since an XML document can be represented as a tree structure,the problem how to cluster a collection of XML documents can be considered as how to cluster a collection of tree-structured documents.The author used SOM(Self-Organizing Map) with the Jaccard coefficient to cluster XML documents.Then,an efficient sequential mining method called GST was applied to find maximum frequent sequences.Finally,the author merged the maximum frequent sequences to produce the common structures in a cluster.
作者 卓月明
出处 《吉首大学学报(自然科学版)》 CAS 2011年第6期55-58,共4页 Journal of Jishou University(Natural Sciences Edition)
关键词 XML文件 树状结构 聚类 序列挖掘 相同结构 XML document tree-structured clustering sequential pattern mining common structure
  • 相关文献

参考文献6

  • 1YANG Jian-wu,,WILLIAM K CHEUNG,CHEN Xiao-ou.Learning the Kernel Matrix for Xml Document Clustering[].Procthe IEEE International Conference on e-Technologye-Commerce and e-Service.2005
  • 2HUANG Yin-fu,LIN Shao-yuan.Mining sequential patterns usinggraph search techniques[].Proc of the th Annual Internationalon Computer Software and Applications Conference.2003
  • 3Wei Jinmao,Wang Shuqin,Wang Jing, et al.Fast Kernel for Calculating Structural Information Similarities[].IEEEIS.2006
  • 4HUANG Yin-fu,LIN Shao-yuan.Mining sequential patterns usinggraph search techniques[].Proc of the th Annual Internationalon Computer Software and Applications Conference.2003
  • 5ALEXANDRE TERMIER,MARIE-CHRISTINE ROUSSET,MICHELE SEBAG.Tree Finder:A First Step Towards XML Data Mining[].Proceedings of theIEEE International Conference on Data Mining.2002
  • 6Agrawal R,Srikant R.Mining sequential patterns[].Proceedings of the th International Conference on Data Engineering (ICDE’).1995

同被引文献22

  • 1Valiant L G. A Bridging Model for Parallel Computation[J]. Communications of the ACM, 1990, 33(3): 103-111.
  • 2Jeffrey D. MapReduce: Simplified Data Processing on Large Clusters[J]. Communications of the ACM, 2008, 51(1): 107-113.
  • 3Grzegorz M, Austern M H, Bik A J C, et al. Pregel: A System for Large-scale Graph Processing[C]//Proc. of SIGMOD'10. Indianapolis, USA: [s. n.], 2010: 135-145.
  • 4Avery C. Giraph: Large-scale Graph Processing Infrastruction on Hadoop[C]//Proceedings of Hadoop Summit. Santa Clara, USA: [s. n.], 2011.
  • 5Tyson C, Nell C, Peter A, et al. MapReduce Online[C]// Proceedings of NSDI' 10. San Jose, USA: [s. n.], 2010: 33-48.
  • 6Lublin U The Workload on Parallel Supercomputers: Model- ing the Characteristics of RigidJobs[J]. Journal of Parallel and Distributed Computing, 2003, 63(20): 1105-1122.
  • 7Jeffrey D, Sanjay G. MapReduce: Simplified Data Processing on Large Clusters[J]. Communications of the ACM, 2008, 51(1): 107-113.
  • 8Valiant L G. A Bridging Model for Parallel Computation[J]. Communications of the ACM, 1990, 33(3): 103-111.
  • 9Grzegorz M, Austern M H, Bik A J C, et al. Pregel: A System for Large-scale Graph Processing[C]//Proc. of SIGMOD'10. Indianapolis, Indiana: [s. n.], 2010: 135-145.
  • 10Ching A. Giraph: Large-scale Graph Processing Infrastruction on Hadoop[C]//Proc. of the Hadoop Summit. Santa Clara, USA: [s. n.], 2011.

引证文献4

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部