摘要
自然语言是一种重要的空间数据来源,从自然语言中获取地理空间信息是地理信息科学的重要研究议题。完善的地理命名实体分类体系,有助于实现自然语言中地理空间信息的解析、存贮、组织、管理、分析及共享应用。现有的基础地理要素分类体系、地名分类体系和组织机构分类体系,分别侧重于不同的应用领域,只能表达自然语言中的部分地理命名实体,没有考虑时空关联特性。本文在参照大量相关标准的基础上,根据大量自然语言文本的标注结果,以地理命名实体所指代的空间位置、地理特征和属性作为分类标准,采用主分表和复分表相结合的方式,设计了地理命名实体分类体系(简称"GNEC")。采用定量和定性相结合的方法,分析了GNEC与GB/T18521-2001,GB/T13923-2006,CHG IS的地名分类体系、ADL的FTT词表之间的兼容性,并以中文文本的地理命名实体解析和地图服务为例,验证了GNEC的应用性能。多样性是自然语言中地理实体描述的重要特征,而分类体系主要实现地理命名实体的概念化操作。因此,在GNEC基础上构建本体,将成为解决这一问题的有效途径。
With the increasing applications of natural language in geographical information science,resolution of geospatial information in natural language has become one of the hot issues.Geographical named entities are identifiers of geographical location information in natural language,which include a majority of popular geographical reference systems such as geographical names,addresses,postal codes,telephone numbers and other relative location descriptions.A complete classification scheme of geographical named entities may help implement resolution,storage,management,analysis and sharing of geographical information in natural language.Commonly-used classifications,i.e.classifications of geographical features,classifications of place names,and organization classifications are identified such disadvantages as over specificity of class items,without the consideration of the relationship of time and space,and the ability of representation of partial geographical entities in natural language.To overcome these problems,based on the annotation results of geographical named entities in Chinese documents,we design a classification scheme of geographical named entities(GNEC) with the consideration of their location,attributes,geographical features and temporal features.GNEC includes one main classification of geographical feature types and one subdivision classification of Chinese historical dynasties.Finally,the semantic compatibility between our proposed classification and GB/T 18521-2001,GB/T 13923-2006,Feature Type Classification for Chinese Historical Places of Harvard University and Feature Type Thesaurus of Alexandria Digital Library are analyzed qualitatively and quantitatively.It is noted that a unique geographical entity is usually described with diverse words in natural language,and sometimes it represents different physical location.Classification schemes aim to conceptualize geographical named entities.Undoubtedly,construction of ontologies based on classification schemes could solve this kind of problem(i.e.semantic ambiguity of geographical named entities) effectively.
出处
《地球信息科学学报》
CSCD
北大核心
2010年第2期220-227,共8页
Journal of Geo-information Science
基金
863课题(2007AA12Z221)
863课题(2007AA12Z218)
南京师范大学重点科研基金资助项目(2006105XGQ0051)提供资助
关键词
地理命名实体
分类体系
地理信息系统
信息共享
geographical named entity geographical feature classification geographical information system information sharing