The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vector space model at first, and then splits words into a set of ...The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vector space model at first, and then splits words into a set of concepts. Through computing the inner products between concepts, it obtains the similarity between words. The new method computes the similarity of text based on the similarity of words at last. The contributions of the paper include: 1) propose a new computing formula between words; 2) propose a new text similarity computing method based on words similarity; 3) successfully use the method in the application of similarity computing of WEB news; and 4) prove the validity of the method through extensive experiments.展开更多
基金Supported by the China Postdoctoral Science Foundation (Grant No. 20060400002)the Sichuan Youth Science and Technology Foundation of China (Grant No. 08JJ0109)+2 种基金the National Natural Science Foundation of China (Grant Nos.60473051, 60503037)the National High-tech Re- search and Development of China (Grant No. 2006AA01Z230)the Natural Science Foundation of Beijing Natural Science Foundation (Grant No. 4062018)
文摘The paper proposes a new text similarity computing method based on concept similarity in Chinese text processing. The new method converts text to words vector space model at first, and then splits words into a set of concepts. Through computing the inner products between concepts, it obtains the similarity between words. The new method computes the similarity of text based on the similarity of words at last. The contributions of the paper include: 1) propose a new computing formula between words; 2) propose a new text similarity computing method based on words similarity; 3) successfully use the method in the application of similarity computing of WEB news; and 4) prove the validity of the method through extensive experiments.