摘要
如何快速、及时地在大规模文集中发现剽窃文本是知识产权保护中的一个核心问题。我们利用Globus构建了一个文本复制检测网格系统,称之为DCDGrid。在DCDGrid原型中我们把单个巨型文集分解成多个中小规模文集,然后将其分布在网络上。通过网格计算的方式,我们可以同时在多台计算机上检测剽窃文本,可以动态增扩检测文集,缩短检测时间,整个系统具有很高的性价比。局域网上的模拟测试表明DCDGrid系统是比较实用的。
How to find the plagiarized documents in immense corpus in time is an open issue for Intellectual Property Protection. We build a document copy detection system based on grid by Globus, which is called DCDGrid. In our prototype we divide the single tremendous corpus into several small-sized corpora and distribute them in the Internet. By means of grid computing, we detect the plagiarism on several computers simultaneously. The grid infrastructure brings us many benefits, such as the ability to enlarge the corpus, shorten response time, and the high performance cost ratio and so on. At last we test the DCDGrid in a LAN and the result shows that DCDGrid is practical.
出处
《微电子学与计算机》
CSCD
北大核心
2004年第9期7-10,共4页
Microelectronics & Computer
基金
国家自然科学基金资助(60173058)
西安交通大学科学研究基金(573031)
关键词
复制检测
网格
剽窃
文本挖掘
Copy detection, Grid, Plagiarism, Text mining