摘要
提出了一种基于领域样本查询的方法以分类这类Web数据库.通过分析领域的高级查询接口自动获取领域主属性并使用领域知识为主属性构建查询样本,然后对查询接口提交试探查询,根据返回结果页面的结果模式和记录内容估计Web数据库与领域的相关程度.通过在多个领域的Web数据库上进行实验验证,说明该方法分类只提供简单查询接口的Web数据库是有效的,取得了较高的分类精确率,召回率和F-measure值.
An approach based on the domain sample query is proposed in this paper to classify the web database, it obtains domain of the main attributes by analyzing descriptive attribute labels in the advanced query interfaces, the correllations of between web database with simple query interface and domain can be estimated by result schema and records of result pages,which obtained by submitting probing queries to simple query interface. The experiments on several domains have proved that this approach is effective and can achieve high classification precision, recall and F-measure values.
出处
《微电子学与计算机》
CSCD
北大核心
2010年第3期20-23,共4页
Microelectronics & Computer
基金
国家自然科学基金项目(60673092)
江苏省重大科技支撑与自主创新项目(BE2008044)
江苏省"六大人才高峰"项目(06-E-037)
江苏省研究生创新计划项目(CX08B_099z)