期刊文献+

基于提取网站层次结构的网页分类方法 被引量:4

Web page classification based on extracting hierarchy from Web site
下载PDF
导出
摘要 网页自动分类是当前互联网搜索领域一个热点研究课题,目前主要有基于网页文本内容的分类和基于网页间超链接结构的分类。但是这些分类都只利用了网页的信息,没有考虑到网页所在网站提供的信息。文中提出了一种全新的对网站内部拓扑结构进行简约的算法,提取网站隐含的层次结构,生成层次结构树,从而达到对网站内部网页实现多层次分类的目的,并且已经成功应用到电子商务智能搜索和挖掘系统中。 Web page classification was one of the hot study problems in the domain of Internet Search currently. Now there were the classifiers based on text and the hyperlinks. But all these methods of classification only used the information of the pages without the information that was provided from the whole web site. In the article, there was a new arithmetic that simplifies the topology structure of the Web site and extracted the connotative hierarchy of the classification to build the classified tree, through which we could achieve the multi-level classification. This method has been applied to the system of intelligent searching and mining of electronic business successfully.
出处 《计算机应用》 CSCD 北大核心 2006年第5期1134-1136,共3页 journal of Computer Applications
基金 广东省科技攻关项目(2005B10101033 A10202001) 广州市科技攻关项目(2004Z2-D0091)
关键词 网页分类 网站层次结构 URL聚类 Web page classification Hierarchy of Web site URL clustering
  • 相关文献

参考文献9

  • 1Http://www. google.com[EB/OL].
  • 2BRIN S, PAGE L. The Anatomy of a Large-Scale Hypertextual Web Search Engine [EB/OL]. http://www.site. uottawa.ca/- start/csi5389/readinga/google.pdf.
  • 3Http://www. yahoo.com[EB/OL].
  • 4KOLLER D, SAHAMI M. Hierarchically classifying documents using very few words[A]. FisherD, ICML 97[C]. SanFrancisco:Morgan Kaufmann, 1997. 170 - 178.
  • 5CHAKRABARTI S, DOM B, INDYK P. Enhanced hypertext categorization using hyperlinks[A]. LAURA MH, TIWARY A. Proc ACM SIGM OD Int Confon Management of Data[C]. NewYork:ACM Press, 1998. 307-318.
  • 6SLATFERY S. Hypertext Classification[D]. Pittsburgh: Carnegie Mellon Univ, 2001.
  • 7CRAVEN M, SLATTERY S. Relation all earning with statistical predicate invention: Better models for hypertext [J]. Machine Learning. 2001,43(1/2) : 97 - 119.
  • 8LODHI H, TAYLOR S, CRISTIANINI N, et al. Text classification using string kernels[A]. NIPS[C], 2000. 563 -569.
  • 9CRISTIANINI N, TAYLOR S, LODHI H. Latent semantic kernels[A]. Proc 18th Int Confon Machine Learning[C].SanFrancisco:Morgan Kaufmann, 2001. 66-73.

同被引文献32

引证文献4

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部