摘要
谓词是句子中的最重要的成分,它的正确与否对语义分析的影响非常大。而众多的特征直接影响到谓词标识的性能,如何组织这些特征显得尤为重要。选取了7个基本特征和30多个新特征以及它们的组合,使用最大熵分类器,在基本特征的基础上通过增加有利特征的方法,使得谓词标注的F1值增长了约5%(由84.7%增加到89.8%),词义识别的F1值增长了约2%(由80.3%增加到82.1%),结果表明,这些新特征及其组合大大提高了性能。
Predicate is the most important component in a sentence,which greatly influences the identification of the semantic analysis.The performance of predicate identification and classification relies on lots of features,but how to combine those features is more important.This paper picks out 7 basic features and over 30 new features with different combinations.By adding useful combinations of the features into the baseline system with the maximum entropy classifier,it improves by 5% of F1-score(from 84.7% up to 89.8%)on predicate identification and also gains about 2% increase of F1-score(from 80.3% up to 82.1%)on predicate classification.It shows that those new features and the combination of them can much improve the performance of the system.
出处
《计算机工程与应用》
CSCD
北大核心
2010年第9期134-137,共4页
Computer Engineering and Applications
基金
国家自然科学基金(No.60673041)
国家高技术研究发展计划(863)(No.2006AA01Z147)
高等院校博士学科点专项科研基金(No.20060285008)~~
关键词
谓词标注和词义识别
语义分析
特征工程
最大熵分类器
predicate identification and predicate classification
semantic analysis
feature engineering
maximum entropy classifier