摘要
结合HTML网页内部特征与外部的结构布局,提出采用映射表这种网页映射模式对网页视图进行变换,基于结构与启发式规则对网页进行区域分割与识别,从而准确得到具有网页各区域的内容。实验结果表明,此方法对各种复杂结构的网页区域分割与识别较为理想。
Combining the Web page's internal features and external structural layout, mapping table is suggested to tansform the view of Web page. The approach gets every area exactly, through Web page's segmentation and the identification based on the structure and revelatory rules. Experimental results show that this method of complex structure Web page's segmentation and identification extraction is ideal.
出处
《现代计算机》
2006年第6期48-50,60,共4页
Modern Computer
基金
山东省自然科学基金资助项目(y2005G21)
关键词
映射表
启发式规则
HTML
区域分割
区域识别
Mapping Table
Revelatory Rules
HTML
Page Segmentation
Area Identification