摘要
互联网的迅速发展,以及人们对于信息需求的提高,使得网络信息的自动处理和挖掘成为了研究热点。在与网络文本相关的信息抽取任务中,观察值序列都是给定的,所以不需要考虑得到观察值的概率,而只需要关注观察值引起的状态转移的概率。最大熵马尔可夫通过改变概率转移函数,使得状态的转移与输入值以及前一状态相联系,很好地体现了序列的上下文信息。通过最大熵马尔科夫模型进行地址信息抽取,精确度和召回率都得到了很大的改进。
With the explosion of information on the Internet and the improvment of the people's information requirment,the automaiton of the information management and the minning is to be the hot.In text-related tasks,the observation sequence is given,so we don't need to care the probability of the observation,but the state sequence induced.MEMM change the probability function of the transition,so the current state is related to its previous state,and the context information is represented.Using the MEMM for extraction of the address,the precison and the recall have been improved.
出处
《计算机工程与应用》
CSCD
北大核心
2005年第21期192-194,共3页
Computer Engineering and Applications
基金
国家863高技术研究发展项目重大专项:宽带网应用支撑平台子课题