期刊文献+

一个基于网格的文本复制检测系统 被引量:7

DCDGrid: A Grid Based Document Copy Detection System
下载PDF
导出
摘要 如何快速、及时地在大规模文集中发现剽窃文本是知识产权保护中的一个核心问题。我们利用Globus构建了一个文本复制检测网格系统,称之为DCDGrid。在DCDGrid原型中我们把单个巨型文集分解成多个中小规模文集,然后将其分布在网络上。通过网格计算的方式,我们可以同时在多台计算机上检测剽窃文本,可以动态增扩检测文集,缩短检测时间,整个系统具有很高的性价比。局域网上的模拟测试表明DCDGrid系统是比较实用的。 How to find the plagiarized documents in immense corpus in time is an open issue for Intellectual Property Protection. We build a document copy detection system based on grid by Globus, which is called DCDGrid. In our prototype we divide the single tremendous corpus into several small-sized corpora and distribute them in the Internet. By means of grid computing, we detect the plagiarism on several computers simultaneously. The grid infrastructure brings us many benefits, such as the ability to enlarge the corpus, shorten response time, and the high performance cost ratio and so on. At last we test the DCDGrid in a LAN and the result shows that DCDGrid is practical.
出处 《微电子学与计算机》 CSCD 北大核心 2004年第9期7-10,共4页 Microelectronics & Computer
基金 国家自然科学基金资助(60173058) 西安交通大学科学研究基金(573031)
关键词 复制检测 网格 剽窃 文本挖掘 Copy detection, Grid, Plagiarism, Text mining
  • 相关文献

参考文献15

  • 1U Manber. Finding Similar Files in a Large File System.In: Proc. of Winter USENIX Conference, 1994: 1~10.
  • 2S Brin, J Davis, H Garcia-Molina. Copy Detection Mechanisms for Digital documents. In: Proc. of the ACM SIGMOD Annual Conference, 1995.
  • 3N Shivakumar, H Garcia-Molina. SCAM: A Copy Detection Mechanism for Digital Documents. In: Proc. of 2nd International Conference in Theory and Practice of Digital Libraries, 1995.
  • 4H Garcia-Molina,L Gravano,N Shivakumar. dSCAM:Finding Document Copies Across Multiple Databases. In: Proc.of 4th International Conference on Parallel and Distributed Systems (PDIS'96), 1996.
  • 5N Shivakumar, H Garcia-Molina. Finding Near-replicas of Documents on the Web. In: Proc. of Workshop on Web Data-bases (WebDB98) held in conjunction with EDBT'98,1998.
  • 6Heintze N. Scalable Document Fingerprinting. In: Proc. of the 2nd USENIX Workshop on Electronic Commerce,1996.
  • 7Broder A Z, Glassman S C, Manasse M S. Syntactic Clustering of the Web. In: Proc. of Sixth International Web Conference, 1997.
  • 8Si A, Leong H V, Lau R W H. CHECK: A Document Plagiarism Detection System. In: Proc. of ACM Symposium for Applied Computing, 1997: 70~77.
  • 9K Monostori, A Zaslavsky, H Schmidt. MatchDetectReveal:Finding Overlapping and Similar Digital Documents. In:Proc. of Information Resources Management Association International Conference (IRMA2000), 2000.
  • 10宋擒豹,沈钧毅.数字商品非法复制和扩散的监测机制[J].计算机研究与发展,2001,38(1):121-125. 被引量:38

二级参考文献5

  • 1[2]Griswold G N. A method for protecting copyright on networks. In: Proc of Joint Harvard MIT Workshop on Technology Strategies for Protecting Intellectual Property in the Networked Multimedia Environment. Cambridge, MA: MIT Press, 1993. 214~221
  • 2[3]Brassil J, Low S, Maxemchuk N et al. Document marking and identification using both line and word shifting. AT & T Bell Laboratories, Tech Rep: TR94.6.8, 1994
  • 3[4]JPEG. JPEG digital compression and coding of continuous still images. ISO, Draft, Tech Rep: ISO 10918, 1991
  • 4[5]Brin S, Davis J, Garciaolina H. Copy detection mechanisms for digital documents. In: Proc of the ACM SIGMOD Int'l Conf on Management of Data. San Francisco, CA: ACM Press, 1995. 398~409
  • 5[1]Popek G J, Kline C S. Encryption and secure computer networks. ACM Computing Surveys, 1979, 11(4): 331~356

共引文献37

同被引文献74

引证文献7

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部