The fundamental problem of similarity studies, in the frame of data-mining, is to examine and detect similar items in articles, papers, and books with huge sizes. In this paper, we are interested in the probabilistic,...The fundamental problem of similarity studies, in the frame of data-mining, is to examine and detect similar items in articles, papers, and books with huge sizes. In this paper, we are interested in the probabilistic, and the statistical and the algorithmic aspects in studies of texts. We will be using the approach of k-shinglings, a k-shingling being defined as a sequence of k consecutive characters that are extracted from a text (k ≥ 1). The main stake in this field is to find accurate and quick algorithms to compute the similarity in short times. This will be achieved in using approximation methods. The first approximation method is statistical and, is based on the theorem of Glivenko-Cantelli. The second is the banding technique. And the third concerns a modification of the algorithm proposed by Rajaraman et al. ([1]), denoted here as (RUM). The Jaccard index is the one being used in this paper. We finally illustrate these results of the paper on the four Gospels. The results are very conclusive.展开更多
在科学及工程系统设计中存在许多混合整数非线性规划MINLP(Mixed-Integer Non Linear Programming)问题,该类问题变量类型丰富且约束条件较多,难以求解,为此提出一种改进果蝇算法。该算法对不同类型变量的更新采取不同的策略,并采用周...在科学及工程系统设计中存在许多混合整数非线性规划MINLP(Mixed-Integer Non Linear Programming)问题,该类问题变量类型丰富且约束条件较多,难以求解,为此提出一种改进果蝇算法。该算法对不同类型变量的更新采取不同的策略,并采用周期性的步长函数指导果蝇的寻优,使其避免陷入局部最优。并通过与另外两种常用的算法在稳定性、收敛速度等方面进行了比较,实验结果表明该改进的果蝇算法效果较优,能有效地解决MINLP问题。展开更多
文摘The fundamental problem of similarity studies, in the frame of data-mining, is to examine and detect similar items in articles, papers, and books with huge sizes. In this paper, we are interested in the probabilistic, and the statistical and the algorithmic aspects in studies of texts. We will be using the approach of k-shinglings, a k-shingling being defined as a sequence of k consecutive characters that are extracted from a text (k ≥ 1). The main stake in this field is to find accurate and quick algorithms to compute the similarity in short times. This will be achieved in using approximation methods. The first approximation method is statistical and, is based on the theorem of Glivenko-Cantelli. The second is the banding technique. And the third concerns a modification of the algorithm proposed by Rajaraman et al. ([1]), denoted here as (RUM). The Jaccard index is the one being used in this paper. We finally illustrate these results of the paper on the four Gospels. The results are very conclusive.
文摘在科学及工程系统设计中存在许多混合整数非线性规划MINLP(Mixed-Integer Non Linear Programming)问题,该类问题变量类型丰富且约束条件较多,难以求解,为此提出一种改进果蝇算法。该算法对不同类型变量的更新采取不同的策略,并采用周期性的步长函数指导果蝇的寻优,使其避免陷入局部最优。并通过与另外两种常用的算法在稳定性、收敛速度等方面进行了比较,实验结果表明该改进的果蝇算法效果较优,能有效地解决MINLP问题。