期刊文献+

基于语义扩展模型的中文网页关键词抽取 被引量:4

Chinese Webpage Keyword Extraction Based on Semantics Extension Model
下载PDF
导出
摘要 提出一种基于语义扩展模型、分步骤的无监督关键词抽取方法。选择词语的网页结构特征、词性、词长、TF-IDF值等特征,通过聚类算法抽取候选关键词。根据n-gram语言模型理论,引入邻接变化数等特征构建基于词的语义扩展模型,采用无监督方法将候选关键词扩展为关键词串。实验结果表明,该方法能有效改善针对未登录词及短语的抽取结果,提高中文网页关键词抽取结果的质量。 This paper presents a Chinese Webpage keyword extraction algorithm based on word extension model. It creates an evaluation function to transform term-document matrix by scoring candidate keyword based on its Web structure, part-of-speech, length, TF-IDF value, and uses the word extension model to extend the candidate keywords into key phrases which is based on the n-gram language model. Experimental results show that the proposed algorithm has better performance compared with the traditional keyword extraction algorithms.
作者 汪洋 帅建梅
出处 《计算机工程》 CAS CSCD 2012年第22期163-166,共4页 Computer Engineering
基金 国家"863"计划基金资助项目"结合语义的视频服务网站自动发现与分析评估"(2008AA01Z408)
关键词 中文网页关键词抽取 语义扩展模型 邻接变化数 聚类算法 n—gram语言模型 Chinese Webpage keyword extraction semantics extension model Accessor Variety(AV) clustering algorithm n-gram language model
  • 相关文献

参考文献8

  • 1Tumey P D. Learning Algorithms for Keyphrase Extraction[J]. Information Retrieval, 2000, 2(4): 303-336.
  • 2Wit-ten 1 H, Paynter G W, Frank E, et al. KEA: Practical Automatic Keyphrase Extraction[C]//Proc. of the 4th ACM Conference on Digital Libraries. Berkeley, USA: ACM Press, 1999: 254-255.
  • 3王少康,董科军,阎保平.使用特征文本密度的网页正文提取[J].计算机工程与应用,2010,46(20):1-3. 被引量:13
  • 4Hulth A. Improved Automatic Keyword Extraction Given More Linguistic Knowledge[C]//Proc. of Conference on Empirical Methods in Natural Language Processing, Stroudsburg, Association for Computational Linguistics. [S. 1.]: ACM Press, 2003: 216-223.
  • 5施洋,张奇,黄萱菁.含有语义特征的网页新闻自动抽取[J].计算机工程,2010,36(7):173-175. 被引量:5
  • 6Stanislaw O, Stefanowski J, Weiss D. Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition[C]// Proc. of International Conference on Intelligent Information Systems.[S. 1.]: Springer, 2004: 359-368.
  • 7Feng Haodi, Chen Kang, Kit C, et al. Unsupervised Segmentation of Chinese Corpus Using Aecessor Variety[C]//Proc. of the 1st International Joint Conference on Natural Language Processing. Sanya, China: [s. n.], 2005: 694-703.
  • 8中国科学院计算技术研究所.汉语词法分析系统ICTCLAS2009版[EB/OL].http://ictclas.org/,2009-02-19/2009-07-06.

二级参考文献12

  • 1刘华.网页信息抽取及建库系统C#实现[J].计算机工程,2006,32(16):49-51. 被引量:5
  • 2Laender A H F, Ribeiro-Neto B A, Silva A S. A Brief Survey of Web Data Extraction Tools[J]. SIGMOD Record, 2002, 31(2): 84-93.
  • 3Chuang S L, Hsu J Y. Tree-structured Template Generation for Web Pages[C]//Proc. of IEEE/WIC/ACM International Conference on Web Intelligence. [S. 1.]: IEEE Computer Society Press, 2004.
  • 4Zheng Shuyi, Song Ruihua, Wen Jirong. Template-independent News Extraction Based on Visual Consistency[C]//Proc. of AAAI'07. Vancouver, Canada: [s. n.], 2007.
  • 5Eikvil L.Information extraction from World Wide Web-A survey[R].Norwegian Computing Center,1999.
  • 6Nechyba M C,Xu Yang-sheng.Stochastic similarity for validating human control strategy models[J].IEEE Trans on Robotics and Automation,1998,14(3):437-451.
  • 7Wang Ji-ying,Lochovsky F H.Data-rich section extraction from HTML pages[C] //Proceedings of the 3rd International Conference on Web Information Systems Engineering.Singapore:IEEE Computer Society Press,2002:313-322.
  • 8Lerman K,Knoblock C,Minton S.Automatic data extraction from lists and tables in web sources[C] //Proceedings of the Workshop on Advances in Text Extraction and Mining.Menlo Park:AAAI Press,2001:149-181.
  • 9Lin Shianhua,Ho Janming.Discovering informative content blocks from Web document[C] //Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Edmonton:ACM Press,2002:588-593.
  • 10Stenback J,Hegaret P L,Hors A L.Document Object Model(DOM) Level 2 HTML specification[EB/OL].(2003).http://www.w3.org/TR/2003/REC-DOM-Level-2-HTML-20030109/DOM2-HTML.html# html-ID-1176245063.

共引文献18

同被引文献44

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部