摘要
利用数据压缩原理,对文本文档进行压缩,由压缩比公式得到相似值.与基于统计的传统方法相比,它具有简便快速的特点.
In the process of information retrieval, the traditional method is to compute similarity between texts. The coefficient similitude figures the degree of compatibility. There are two main methods: Correlation coefficient and Cosine. We base on the theory of Data Compression and use the compression ratio to (express) the similarity between texts. It has some advantages over the others.
出处
《延边大学学报(自然科学版)》
CAS
2004年第2期143-146,共4页
Journal of Yanbian University(Natural Science Edition)
关键词
信息检索
数据压缩
相似度
Information retrieval
Data Compression
Similarity