期刊文献+

基于最大熵模型的不良文本识别

Illegitimate Contents Recognition based on Maximun Entropy Model
下载PDF
导出
摘要 构建了一个基于最大熵原理的不良文本识别模型,该模型分为训练和测试两个模块,先从训练语料中抽取特征,利用最大熵方法对特征进行训练,然后使用经过训练的特征,对测试集中的不良文本进行识别,达到了比较满意的识别效果,最后对实验结果进行了分析。 To constructs a model for illegitimate contents recognition, which is based on the maximum entropy principle. The model consists of a training module and a testing module. At first, features are extracted from the training corpus. The maximum entropy principle is employed to train the features. Then the trained features are used to recognize illegitimate contents in the testing set. The experimental results are satisfying and have been analyzed at the end of th is paper.
作者 高峰 张永奎
出处 《电脑开发与应用》 2009年第1期6-8,共3页 Computer Development & Applications
基金 国家自然科学基金资助项目(60475022)
关键词 最大熵模型 特征选择 特征函数 不良文本识别 maximum entropy model, feature selection, feature function, illegitimate contents recognition
  • 相关文献

参考文献6

二级参考文献28

  • 1赵世奇,张宇,刘挺,陈毅恒,黄永光,李生.基于类别特征域的文本分类特征选择方法[J].中文信息学报,2005,19(6):21-27. 被引量:21
  • 2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:387
  • 3D. D. Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In: Proc. of the 10th European Conf. on Machine Learning. New York: Springer,1998, 4-15.
  • 4Y. Yang, X. Lin. A re-examination of text categorization methods. In: The 22nd Annual Int'l ACM SIGIR Conf. onResearch and Development in the Information Retrieval. NewYork: ACM Press, 1999.
  • 5Y. Yang, C. G. Chute. An example based mapping method for text categorization and retrieval. ACM Trans. on Information Systems, 1994, 12(3): 252 -277.
  • 6E. Wiener. A neural network approach to topic spotting. The 4th Annual Syrup. on Document Analysis and Information Retrieval,Las Vegas, NV, 1995.
  • 7R. E. Schapire, Y. Singer. Improved boosting algorithms using confidence-rated predications. In: Proc. of the 11th Annual Conf.on Computational Learning Theory. New York: ACM Press,1998. 80--91.
  • 8T. Joachims. Text categorization with support vector machines:Learning with many relevant features. In: Proc. of the 10th European Conf. on Machine Learning. New York: Springer,1998. 137-142.
  • 9Y. Yang. An evaluation of statistical approaches to text categorization. Information Retrieval, 1999, 1 ( 1 ) : 76-- 88.
  • 10R. Adwait. Maximum entropy models for natural language ambiguity resolution: [ Ph. D. dissertation ] . Pennsylvania:University of Pennsylvania, 1998.

共引文献186

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部