摘要
构建了一个基于最大熵原理的不良文本识别模型,该模型分为训练和测试两个模块,先从训练语料中抽取特征,利用最大熵方法对特征进行训练,然后使用经过训练的特征,对测试集中的不良文本进行识别,达到了比较满意的识别效果,最后对实验结果进行了分析。
To constructs a model for illegitimate contents recognition, which is based on the maximum entropy principle. The model consists of a training module and a testing module. At first, features are extracted from the training corpus. The maximum entropy principle is employed to train the features. Then the trained features are used to recognize illegitimate contents in the testing set. The experimental results are satisfying and have been analyzed at the end of th is paper.
出处
《电脑开发与应用》
2009年第1期6-8,共3页
Computer Development & Applications
基金
国家自然科学基金资助项目(60475022)
关键词
最大熵模型
特征选择
特征函数
不良文本识别
maximum entropy model, feature selection, feature function, illegitimate contents recognition