摘要
主要介绍从中文专利文本中识别新技术术语的方法。利用ICTCLAS分词系统和停用词表抽取文档词元,通过改进的TFIDF模型计算词元权重并筛选出热点词元,再通过词间距测算对热点词元按顺序进行组配,经权重计算和阈值筛选后得到术语集,由专家人工判定识别出有效的新技术术语。最后给出应用实例并进行分析,验证该方法的有效性。
This paper promotes a method which detecting new technology term from the texts of Chinese patents. Firstly, the eIement of terms in patents are extracted by ICTCLAS segmentation system and stop words lists. Then the hot elements of terms are filtered based on terms weights computing by improved TFIDF model. Secondly, the hot elements of terms are combined orderly by computing the distance between two words, and obtain the terms collection by terms weights computing and threshold filtering. The valid new technology terms are detected by the experts artificially. Finally, availability of the method is proved through the applied example.
出处
《现代图书情报技术》
CSSCI
北大核心
2012年第11期53-59,共7页
New Technology of Library and Information Service
关键词
技术生命周期
术语识别热点词元
Technology life cycle Term detection Hot elements of terms