期刊文献+

基于TF*IDF垃圾邮件过滤改进算法的研究 被引量:2

Research of Improvement of TF*IDF Algorithm Based on the Spam Filtering
下载PDF
导出
摘要 传统TF*IDF算法是计算文档关键字的权值的重要方法。分析了传统TF*IDF算法在划分垃圾邮件和合法邮件时的缺点。即忽视了在一类文档中反复出现的单词,反复出现的单词往往最具有代表该类文档的特征,权重应该是比较高的。但这种情况,传统TF*IDF算法计算出结果恰恰相反,权重偏低,达不到设计者的要求。故通过改进了传统TF*IDF算法计算公式,来增加这些单词的权重。实验证明改进算法优于传统算法: Traditional TF*IDF algorithm is important methods to calculate the weight of keywords in documents. Analyzing disadvantages of the traditional TF * IDF algorithm division spam and lawful email. It has neglected the repeated words in a class of the document, the repeated words often represent features of the class of this document, weight of words should be higher. But this kind of situation , traditional TF*IDF algorithm calculated results, on the contrary, low weight, and not reaching the requirement of designers. Through the improvement of traditional TF * IDF algorithm, and to increase the weight of these words. Experiments prove the improved algorithm is superior to the traditional algorithm.
作者 常凯 CHANG Kai (Hubei University of Technology, Wuhan 430068, China)
机构地区 湖北工业大学
出处 《电脑知识与技术》 2010年第9期6928-6930,共3页 Computer Knowledge and Technology
关键词 TF*IDF 权重 分类 垃圾邮件 TF*IDF weight classification spare
  • 相关文献

参考文献5

二级参考文献37

共引文献150

同被引文献26

  • 1苏新宁.图书馆、情报与文献学研究热点与趋势分析(2000—2004)——基于CSSCI的分析[J].情报学报,2007,26(3):373-383. 被引量:49
  • 2Behm Alexander,Ji Shengyue,Li Chen,et al.Space-constrained gram-based indexing for efficient approximate string search[].ICDE.2009
  • 3S. Ji,,G. Li,,C. Li,,J. Feng.Efficient Interactive Fuzzy Keyword Search[].Proceedings of the th international conference on World Wide Web.2009
  • 4Li C,Lu J,Lu Y.Efficient merging and filtering algorithmsfor approximate string searches[].ICDE.
  • 5S. Chaudhuri,V. Ganti,R. Kaushik.A Primitive Operator for Similarity Joins in Data Cleaning[].ICDE.2006
  • 6Kukich K.Techniques for automatically correcting words in text[].ACM Computing Surveys.1992
  • 7Wagner RA,Fischer MJ.The String-to-String Correction Problem[].The Journal of The American Medical Association.1974
  • 8Wang J,Li G,Feng J.Fast-join:An efficient method forfuzzy token matching based string similarity join[].Proceed-ings of the ICDE.2011
  • 9J. Wang,G. Li,J. Feng.Trie-join: Efficient trie-based string similarity joins with edit-distance constraints[].PVLDB.2010
  • 10李长玲,翟雪梅.基于硕士学位论文的我国图书馆学与情报学研究热点分析[J].情报科学,2008,26(7):1056-1060. 被引量:28

引证文献2

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部