期刊文献+

基于正态分布的词频分析法高频词阈值研究 被引量:60

The Research on the Threshold of High-Frequency Words Based on the Normal Distribution in Word Frequency Analysis
下载PDF
导出
摘要 词频分析法高频关键词或主题词的界定是开展信息分析的重要基础。首先,在文献统计分析的基础上,总结了目前词频分析法高频词确定的四种方法:TOPN方法、WF>=M方法、%WF=P方法以及T计算方法,这些方法存在着经验性、随意性、理论基础和适用性上的问题。接着,通过实证方法,验证了关键词和主题词在文献库中的分布符合正态分布,并根据正态分布的特性,提出了词频分析法高频词阈值的F计算方法。最后,在多个数据样本基础上,将F方法与T方法进行了对比分析,认为基于正态分布的高频词阈值F计算方法在理论基础和适用性上都能达到较好的效果。 Along with the outburst of information and the developing of information analysis,word frequency analysis is becoming more and more popular in which the defining of high-frequency words serves as the cornerstone.By summarizing the precedent literature researches,this paper first concluded four methods of defining high-frequency words at present,i.e.TOPN,WF = M,% WF = P and T formula.After briefly discussing the main and obvious shortcomings of the above four methods,such as depending on experience too much,subjectivity,lack of theoretical background,inapplicability or impracticability and so on,the paper empirically tested and verified the normal distribution of high-frequency words in depositories,and accordingly proposed the F formula for threshold analysis of high-frequency words.At the final part,the paper compared and contrasted the T formula and the F formula through the analysis of many datasets,and by doing this the F formula was theoretically and applicably legitimized in the research of threshold of high-frequency words based on normal distribution.
作者 安兴茹
出处 《情报杂志》 CSSCI 北大核心 2014年第10期129-136,共8页 Journal of Intelligence
关键词 词频分析法 正态分布 高频词 齐普夫定律 word frequency analysis normal distribution High-frequency Words Zipf's Law
  • 相关文献

参考文献36

二级参考文献235

共引文献1521

同被引文献875

引证文献60

二级引证文献586

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部