基于聚类技术的XML文件代表性结构获取被引量：4

Representative Structures from XML Documents Based on Clustering Techniques

下载PDF

导出

摘要 XML文件可以利用树状结构来表示,于是把如何将XML文件做聚类看成如何对树状结构的数据作聚类.使用SOM聚类工具搭配上Jaccard的距离测量公式来对XML文件做聚类,然后在每个cluster中利用GST(Graph SearchTechnique)算法从这些XML文件当中找出他们的最大序列,最后将这些最大序列融合起来成为共同的结构. Since an XML document can be represented as a tree structure,the problem how to cluster a collection of XML documents can be considered as how to cluster a collection of tree-structured documents.The author used SOM(Self-Organizing Map) with the Jaccard coefficient to cluster XML documents.Then,an efficient sequential mining method called GST was applied to find maximum frequent sequences.Finally,the author merged the maximum frequent sequences to produce the common structures in a cluster.

作者卓月明

机构地区吉首大学软件服务外包学院

出处《吉首大学学报（自然科学版）》 CAS 2011年第6期55-58,共4页 Journal of Jishou University(Natural Sciences Edition)

关键词 XML文件树状结构聚类序列挖掘相同结构 XML document tree-structured clustering sequential pattern mining common structure

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

参考文献6

1YANG Jian-wu,,WILLIAM K CHEUNG,CHEN Xiao-ou.Learning the Kernel Matrix for Xml Document Clustering[].Procthe IEEE International Conference on e-Technologye-Commerce and e-Service.2005
2HUANG Yin-fu,LIN Shao-yuan.Mining sequential patterns usinggraph search techniques[].Proc of the th Annual Internationalon Computer Software and Applications Conference.2003
3Wei Jinmao,Wang Shuqin,Wang Jing, et al.Fast Kernel for Calculating Structural Information Similarities[].IEEEIS.2006
4HUANG Yin-fu,LIN Shao-yuan.Mining sequential patterns usinggraph search techniques[].Proc of the th Annual Internationalon Computer Software and Applications Conference.2003
5ALEXANDRE TERMIER,MARIE-CHRISTINE ROUSSET,MICHELE SEBAG.Tree Finder:A First Step Towards XML Data Mining[].Proceedings of theIEEE International Conference on Data Mining.2002
6Agrawal R,Srikant R.Mining sequential patterns[].Proceedings of the th International Conference on Data Engineering (ICDE’).1995

同被引文献22

1Valiant L G. A Bridging Model for Parallel Computation[J]. Communications of the ACM, 1990, 33(3): 103-111.
2Jeffrey D. MapReduce: Simplified Data Processing on Large Clusters[J]. Communications of the ACM, 2008, 51(1): 107-113.
3Grzegorz M, Austern M H, Bik A J C, et al. Pregel: A System for Large-scale Graph Processing[C]//Proc. of SIGMOD'10. Indianapolis, USA: [s. n.], 2010: 135-145.
4Avery C. Giraph: Large-scale Graph Processing Infrastruction on Hadoop[C]//Proceedings of Hadoop Summit. Santa Clara, USA: [s. n.], 2011.
5Tyson C, Nell C, Peter A, et al. MapReduce Online[C]// Proceedings of NSDI' 10. San Jose, USA: [s. n.], 2010: 33-48.
6Lublin U The Workload on Parallel Supercomputers: Model- ing the Characteristics of RigidJobs[J]. Journal of Parallel and Distributed Computing, 2003, 63(20): 1105-1122.
7Jeffrey D, Sanjay G. MapReduce: Simplified Data Processing on Large Clusters[J]. Communications of the ACM, 2008, 51(1): 107-113.
8Valiant L G. A Bridging Model for Parallel Computation[J]. Communications of the ACM, 1990, 33(3): 103-111.
9Grzegorz M, Austern M H, Bik A J C, et al. Pregel: A System for Large-scale Graph Processing[C]//Proc. of SIGMOD'10. Indianapolis, Indiana: [s. n.], 2010: 135-145.
10Ching A. Giraph: Large-scale Graph Processing Infrastruction on Hadoop[C]//Proc. of the Hadoop Summit. Santa Clara, USA: [s. n.], 2011.

引证文献4

1郭鑫,颜一鸣,徐洪智,董坚峰.不确定树数据库中的动态聚类算法[J].小型微型计算机系统,2013,34(6):1339-1343. 被引量：4
2郭鑫,颜一鸣,徐洪智,覃遵跃.动态云平台下的快速闭树聚类并行算法[J].计算机工程,2013,39(9):80-83. 被引量：2
3郭鑫,颜一鸣.一种动态云模型下树数据挖掘算法[J].小型微型计算机系统,2013,34(12):2749-2752. 被引量：8
4颜一鸣,郭鑫.一种基于Hadoop的动态树增量更新方法[J].计算机工程,2014,40(3):67-70. 被引量：1

二级引证文献14

1孙亮.对大规模数据集高效数据挖掘算法的研究[J].自动化与仪器仪表,2016(3):192-193. 被引量：10
2黄取治.动态云模型大规模数据挖掘算法[J].长春工业大学学报,2014,35(3):305-308. 被引量：2
3胡德敏,余星.一种不确定数据流子空间聚类算法[J].计算机应用研究,2014,31(9):2606-2608. 被引量：1
4李雪琴,李聪,马丽,梁昌勇.树型网络相似性度量方法研究：一个分类视角[J].情报学报,2014,33(11):1146-1159.
5郭晋秦,韩焱.大型数据库聚类中伪装危险数据识别方法研究[J].计算机仿真,2015,32(11):433-436. 被引量：4
6王曙霞,胡瑞敏,梁意文,熊曾刚.云服务器中的不稳定数据挖掘系统的研究与设计[J].现代电子技术,2016,39(6):49-52. 被引量：4
7杨小琴.大型数据库中的并行高效检测方法仿真分析[J].计算机仿真,2016,33(7):392-394. 被引量：2
8陈凤娟.基于概率模型的概率频繁项集挖掘方法[J].安阳师范学院学报,2017(2):57-60.
9张捷,封俊红,朱晓姝.云计算环境下海量数据挖掘的优化方法研究[J].玉林师范学院学报,2017,38(5):146-151. 被引量：6
10黄德军,贾如春,李林原.基于Web Services的SOA自动化控制架构的研究与实现[J].自动化与仪器仪表,2018,0(7):63-67. 被引量：12

1赵丽嫒.小搜索大未来[J].数码设计,2013(3):84-87.
2杨黎刚,苏宏业,张英,褚健.基于SOM聚类的数据挖掘方法及其应用研究[J].计算机工程与科学,2007,29(8):133-136. 被引量：32
3杨晓敏,严斌宇,吴炜,何小海.基于图像色彩和纹理的SOM聚类和检索方法[J].四川大学学报（自然科学版）,2010,47(3):525-529. 被引量：4
4甄志龙,于非,王海鹃.有监督保局索引的文本表示方法[J].通化师范学院学报,2010,31(8):40-41.
5冯中慧,何亮,王栋.基于新的成员选择方法的聚类融合算法[J].微电子学与计算机,2016,33(11):25-29. 被引量：3
6张瑞,王继奎,郭娟娟.基于本体的自适应网站研究[J].科技传播,2010,2(16):204-204.
7刘芳.基于SOM聚类的可视化方法及应用研究[J].计算机应用研究,2012,29(4):1300-1303. 被引量：6
8孙宇.一种基于Jaccard相似度的社团发现方法[J].电子技术与软件工程,2016(3):20-20. 被引量：4
9李金才,刘国华,郗君甫,吕艳丽.一种满足最大隐私泄漏率要求的匿名方法[J].燕山大学学报,2010,34(3):225-230. 被引量：1
10赵夏清.关于工程测量中智能控制应用初探[J].自动化与仪器仪表,2015(7):140-142. 被引量：3

吉首大学学报（自然科学版）

2011年第6期

浏览历史

内容加载中请稍等...

基于聚类技术的XML文件代表性结构获取被引量：4

参考文献6

同被引文献22

引证文献4

二级引证文献14

相关作者

相关机构

相关主题

浏览历史

基于聚类技术的XML文件代表性结构获取 被引量：4

参考文献6

同被引文献22

引证文献4

二级引证文献14

相关作者

相关机构

相关主题

浏览历史

基于聚类技术的XML文件代表性结构获取被引量：4