摘要
热词是一种网络词汇现象,反映了某一特定时空范围内人们普遍关注的问题。该文对热词分析的两项关键技术——热词发现和热词关联技术进行了深入的研究。在热词发现阶段,首先采用命名实体识别技术和高频串统计技术进行短语串的挖掘,继而采用基础权值和波动权值两项指标进行热度权值的计算。在热词关联阶段,按热词权值高低进行热词类的划分,通过同现率的原则确定热词类之间的关联计算。该文所采用的方法已经成功应用到TRS舆情监测系统的热点发现模块。
Hot-word is a network phenomenon,which reflects some popular feelings and topics at a particular time and space.In this paper,two key technologies of hot-word analyzing are discussed,including hot-word discovering and associating technique.In the phase of word discovering,firstly,we get named entity recognition techniques and statistical techniques for high frequency phrase to do string excavation.Then,we take the basis of weight and weight fluctuations to compute hot-word weight.Up to the hot-word association,they are derided from the difference of the weight value of hot-word,and hot-word relationship was computed from the principle of co-occurrence rate.The technology has been successfully applied to hot-word discovering module,which is a part of TRS public sentiment monitoring system.
出处
《中文信息学报》
CSCD
北大核心
2011年第1期48-53,59,共7页
Journal of Chinese Information Processing
基金
国家863计划重点资助项目(2006AA010105)
国家自然科学基金资助项目(60772081)
北京市教委科技发展计划资助项目(KM200910772022)
关键词
热词
命名实体识别
热度计算
波动权值
词群关系
hot words
named entity identification
hot degree computing
weight fluctuations
words relationship