期刊文献+

基于线性分析的日语文本分类模型构建研究 被引量:4

A Study on the Construction of Japanese Text Classification Model Based on Linear Analysis
原文传递
导出
摘要 本文以涵盖日常会话、会议发言、小说、议论文、政府白皮书以及新闻报道等多个类型的训练组文本为对象,统计其名词比、数词比、接续词句比等22项数据,将其作为文本表示方式进行线性分析,从中选取14项具有显著判别能力的指标,确定了其权重,由此构建基于Bayes分类函数的文本分类模型。观察这14项典型指标可知,除词汇占比类数据外,句长等指标也能够成为文本分类的有效依据。经测试,在绝大多数情况下,该模型的分类准确率都高于85%,召回率都高于81%,实现了以较小的运算量达到较高分类精度的目标。 The paper applied 22 indicators such as percentages of nouns,numeral,and sentences with conjunction,obtained from a training data set which includes daily conversations,conference speeches,novels,argumentation,government white papers,and news reports,as text features in the linear analysis to construct a text classification model.After the analysis,14 indicators with significant discrimination are selected and the weights of these indicators are determined.Among these 14 typical ones,not only vocabulary-based ones such as percentages of nouns etc.,but indicators such as sentence length are also effective.A text classification model based on Bayes classification function is constructed.After a test with the test data set,it is found that in most cases,the precision of this model is over 85%and the recall rate is over 81%.So it is proved that the model can achieve a higher accuracy with smaller computation.
作者 毛文伟 MAO Wen-wei(Office of Research Affairs,Shanghai International Studies University,Shanghai 200083,China)
机构地区 上海外国语大学
出处 《外语电化教学》 CSSCI 北大核心 2019年第6期97-102,112,共7页 Technology Enhanced Foreign Language Education
基金 2019年国家社科基金项目“基于数据挖掘技术的中国日语学习者认知机制研究”(项目编号:19BYY201)的阶段性成果
关键词 文本分类 线性分析 日语 文本特征 贝叶斯 Text Classification Linear Analysis Japanese Text Features Bayes
  • 相关文献

参考文献6

二级参考文献55

共引文献49

同被引文献49

引证文献4

二级引证文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部