期刊文献+

融合实用性与科学性的互联网信息分类体系构建 被引量:8

The Classification System Construction for Internet Information both Practical and Scientific
下载PDF
导出
摘要 分类体系是信息组织的有效形式,传统文献分类体系难以适用分类对象的转变,实用性不足,已有的网络分类体系则缺乏科学性。构建融合实用性与科学性的互联网信息分类体系,能够有效满足用户信息需求,且是自动文本分类技术研究的基础。文章分别以中图法、新浪门户为例,研究传统文献分类法与网络信息分类法的优缺点,提出互联网信息分类体系的实用性、科学性以及均衡性设计原则,基于三个设计原则构建了互联网信息分类体系。为了验证所构建的分类体系的有效性,通过网络爬虫抓取网易门户以及腾讯网的语料作为实验数据,与复旦语料库的分类体系进行对比实验。实验结果表明,相比于复旦语料库的分类体系,文章所提出的互联网信息分类体系具有更高的实用性,且能更为全面地涵盖各种互联网信息,类目之间交叉度小,各个类目信息量接近,文本分类效果更为理想。 The classification system is an effective method of information organization. The traditional classification system can not adapt to the transformation of classification object and is no longer practical; at the same time, the existing network classification system is not scientific. An Internet information classification system both practical and scientific can not only effectively meet the users' information demand, but can also promote the development of automatic text classification. Taking Chinese Library Classification and Sina portal for examples respectively, this paper studies the advantages and disadvantages between traditional document classification and taxonomy of network information, come up with the design principles of the internet information classification system, namely practical,scientific and balance. Based on these three design principles, an internet information classification system was built. In order to verify the validity of the classification system, the web crawler is used to grab corpus of www.163.com and www.qq.com which are as experimental data, and Fudan Corpus classification system is used for the comparative experiment. Experimental results show that, compared to the Fudan Corpus classification system, the proposed Internet Information Classification System has a higher practicality, and can more comprehensively cover all kinds of Internet information, little intersections among categories, more approach between the information of each category, the text classification efficiency is quietly improved.
出处 《图书与情报》 CSSCI 北大核心 2015年第3期118-124,144,共8页 Library & Information
基金 国家自然科学基金项目"面向文本分类的多学科协同建模理论与实验研究"(项目编号:71373291)研究成果之一
关键词 互联网信息 分类体系 中图法 语料库 internet information classification system chinese library classification corpus
  • 相关文献

参考文献21

二级参考文献98

共引文献320

同被引文献188

引证文献8

二级引证文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部