摘要
文章吸收词频原则、逆文档频率原则以及共词分析的思想,提出解决文档主题新颖度量化问题的4个原则,在此基础上定义带时间戳关键词逆文档频率、带时间戳关键词对逆文档频率、文档新颖度等3个概念,给出文档新颖度的计算公式,并对该公式的实用性与合理性进行实证研究。实验结果表明:文中提出的文档主题新颖度量化方法是科学的、合理的、可操作的,但是,不规范的标引词标引、关键词个数过少等现象对主题新颖度计量结果的准确性影响较大。
This paper absorbs the principle of term frequency, the principle of inverse document frequency and the idea of co- word analysis to propose 4 principles for solving the quantization of the novelty of the document theme. On this basis, the paper de- fines 3 concepts, namely, the referenced inverse document frequency of keyword, the referenced inverse document frequency of keyword pair, and the document novelty. The paper gives the calculation formula of the novelty of the document theme, and makes an empirical study of the practicality and rationality of the formula. The experimental results show that the proposed document theme novelty quantization method is scientific, reasonable and workable. However, the phenomena of non-standard indexing of keywords, too small number of keywords, ere, impact a lot on the accuracy of the novelty of the document theme.
出处
《情报理论与实践》
CSSCI
北大核心
2013年第3期99-102,共4页
Information Studies:Theory & Application
基金
国家社会科学基金项目(项目编号:09BTQ020)
江苏高校哲学社会科学研究重点项目(项目编号:2011ZDIXM035)资助
关键词
文档主题新颖度
关键词
度量方法
novelty of document theme
keyword
measurement method