The probability-based covering algorithm(PBCA) is a new algorithm based on probability distribution. It decides, by voting, the class of the tested samples on the border of the coverage area, based on the probability ...The probability-based covering algorithm(PBCA) is a new algorithm based on probability distribution. It decides, by voting, the class of the tested samples on the border of the coverage area, based on the probability of training samples. When using the original covering algorithm(CA), many tested samples that are located on the border of the coverage cannot be classified by the spherical neighborhood gained. The network structure of PBCA is a mixed structure composed of both a feed-forward network and a feedback network. By using this method of adding some heterogeneous samples and enlarging the coverage radius,it is possible to decrease the number of rejected samples and improve the rate of recognition accuracy. Relevant computer experiments indicate that the algorithm improves the study precision and achieves reasonably good results in text classification.展开更多
Multiple-Instance Learning (MIL) is used to predict the unlabeled bags' label by learning the labeled positive training bags and negative training bags.Each bag is made up of several unlabeled instances.A bag is la...Multiple-Instance Learning (MIL) is used to predict the unlabeled bags' label by learning the labeled positive training bags and negative training bags.Each bag is made up of several unlabeled instances.A bag is labeled positive if at least one of its instances is positive,otherwise negative.Existing multiple-instance learning methods with instance selection ignore the representative degree of the selected instances.For example,if an instance has many similar instances with the same label around it,the instance should be more representative than others.Based on this idea,in this paper,a multiple-instance learning with instance selection via constructive covering algorithm (MilCa) is proposed.In MilCa,we firstly use maximal Hausdorff to select some initial positive instances from positive bags,then use a Constructive Covering Algorithm (CCA) to restructure the structure of the original instances of negative bags.Then an inverse testing process is employed to exclude the false positive instances from positive bags and to select the high representative degree instances ordered by the number of covered instances from training bags.Finally,a similarity measure function is used to convert the training bag into a single sample and CCA is again used to classification for the converted samples.Experimental results on synthetic data and standard benchmark datasets demonstrate that MilCa can decrease the number of the selected instances and it is competitive with the state-of-the-art MIL algorithms.展开更多
In this paper, a new covering algorithm called FCV1 is presented. FCV1 comprises two algorithms, one of which is able to fast search for a partial rule and exclude the larg portion of negative examples, the other algo...In this paper, a new covering algorithm called FCV1 is presented. FCV1 comprises two algorithms, one of which is able to fast search for a partial rule and exclude the larg portion of negative examples, the other algorithm incorporates the more optimized greedy set-covering algorithm, and runs on a small portion of training examples. Hence,the training process of FCV1 is much faster than that of AQ15.展开更多
Complex networks have recently attracted much attention in diverse areas of science and technology. Many networks such as the WWW and biological networks are known to display spatial heterogeneity which can be charact...Complex networks have recently attracted much attention in diverse areas of science and technology. Many networks such as the WWW and biological networks are known to display spatial heterogeneity which can be characterized by their fractal dimensions. Multifractal analysis is a useful way to systematically describe the spatial heterogeneity of both theoretical and experimental fractal patterns. In this paper, we introduce a new box-covering algorithm for multifractal analysis of complex networks. This algorithm is used to calculate the generalized fractal dimensions Dq of some theoretical networks, namely scale-free networks, small world networks, and random networks, and one kind of real network, namely protein protein interaction networks of different species. Our numerical results indicate the existence of multifractality in scale-free networks and protein protein interaction networks, while the multifractal behavior is not clear-cut for small world networks and random networks. The possible variation of Dq due to changes in the parameters of the theoretical network models is also discussed.展开更多
Mining from ambiguous data is very important in data mining. This paper discusses one of the tasks for mining from ambiguous data known as multi-instance problem. In multi-instance problem, each pattern is a labeled b...Mining from ambiguous data is very important in data mining. This paper discusses one of the tasks for mining from ambiguous data known as multi-instance problem. In multi-instance problem, each pattern is a labeled bag that consists of a number of unlabeled instances. A bag is negative if all instances in it are negative. A bag is positive if it has at least one positive instance. Because the instances in the positive bag are not labeled, each positive bag is an ambiguous. The mining aim is to classify unseen bags. The main idea of existing multi-instance algorithms is to find true positive instances in positive bags and convert the multi-instance problem to the supervised problem, and get the labels of test bags according to predict the labels of unknown instances. In this paper, we aim at mining the multi-instance data from another point of view, i.e., excluding the false positive instances in positive bags and predicting the label of an entire unknown bag. We propose an algorithm called Multi-Instance Covering kNN (MICkNN) for mining from multi-instance data. Briefly, constructive covering algorithm is utilized to restructure the structure of the original multi-instance data at first. Then, the kNN algorithm is applied to discriminate the false positive instances. In the test stage, we label the tested bag directly according to the similarity between the unseen bag and sphere neighbors obtained from last two steps. Experimental results demonstrate the proposed algorithm is competitive with most of the state-of-the-art multi-instance methods both in classification accuracy and running time.展开更多
In today's world of excessive development in technologies, sustainability and adaptability of computer applications is a challenge, and future prediction became significant. Therefore, strong artificial intelligence ...In today's world of excessive development in technologies, sustainability and adaptability of computer applications is a challenge, and future prediction became significant. Therefore, strong artificial intelligence (AI) became important and, thus, statistical machine learning (ML) methods were applied to serve it. These methods are very difficult to understand, and they predict the future without showing how. However, understanding of how machines make their decision is also important, especially in information system domain. Consequently, incremental covering algorithms (CA) can be used to produce simple rules to make difficult decisions. Nevertheless, even though using simple CA as the base of strong AI agent would be a novel idea but doing so with the methods available in CA is not possible. It was found that having to accurately update the discovered rules based on new information in CA is a challenge and needs extra attention. In specific, incomplete data with missing classes is inappropriately considered, whereby the speed and data size was also a concern, and future none existing classes were neglected. Consequently, this paper will introduce a novel algorithm called RULES-IT, in order to solve the problems of incremental CA and introduce it into strong AI. This algorithm is the first incremental algorithm in its family, and CA as a whole, that transfer rules of different domains to improve the performance, generalize the induction, take advantage of past experience in different domain, and make the learner more intelligent. It is also the first to introduce intelligent aspectsinto incremental CA, including consciousness, subjective emotions, awareness, and adjustment. Furthermore, all decisions made can be understood due to the simple representation of repository as rules. Finally, RULES-IT performance will be benchmarked with six different methods and compared with its predecessors to see the effect of transferring rules in the learning process, and to prove how RULES-IT actually solved the shortcoming of current incremental CA in addition to its improvement in the total performance.展开更多
基金supported by the Fund for Philosophy and Social Science of Anhui Provincethe Fund for Human and Art Social Science of the Education Department of Anhui Province(Grant Nos.AHSKF0708D13 and 2009sk038)
文摘The probability-based covering algorithm(PBCA) is a new algorithm based on probability distribution. It decides, by voting, the class of the tested samples on the border of the coverage area, based on the probability of training samples. When using the original covering algorithm(CA), many tested samples that are located on the border of the coverage cannot be classified by the spherical neighborhood gained. The network structure of PBCA is a mixed structure composed of both a feed-forward network and a feedback network. By using this method of adding some heterogeneous samples and enlarging the coverage radius,it is possible to decrease the number of rejected samples and improve the rate of recognition accuracy. Relevant computer experiments indicate that the algorithm improves the study precision and achieves reasonably good results in text classification.
基金supported by the National Natural Science Foundation of China (No. 61175046)the Provincial Natural Science Research Program of Higher Education Institutions of Anhui Province (No. KJ2013A016)+1 种基金the Outstanding Young Talents in Higher Education Institutions of Anhui Province (No. 2011SQRL146)the Recruitment Project of Anhui University for Academic and Technology Leader
文摘Multiple-Instance Learning (MIL) is used to predict the unlabeled bags' label by learning the labeled positive training bags and negative training bags.Each bag is made up of several unlabeled instances.A bag is labeled positive if at least one of its instances is positive,otherwise negative.Existing multiple-instance learning methods with instance selection ignore the representative degree of the selected instances.For example,if an instance has many similar instances with the same label around it,the instance should be more representative than others.Based on this idea,in this paper,a multiple-instance learning with instance selection via constructive covering algorithm (MilCa) is proposed.In MilCa,we firstly use maximal Hausdorff to select some initial positive instances from positive bags,then use a Constructive Covering Algorithm (CCA) to restructure the structure of the original instances of negative bags.Then an inverse testing process is employed to exclude the false positive instances from positive bags and to select the high representative degree instances ordered by the number of covered instances from training bags.Finally,a similarity measure function is used to convert the training bag into a single sample and CCA is again used to classification for the converted samples.Experimental results on synthetic data and standard benchmark datasets demonstrate that MilCa can decrease the number of the selected instances and it is competitive with the state-of-the-art MIL algorithms.
文摘In this paper, a new covering algorithm called FCV1 is presented. FCV1 comprises two algorithms, one of which is able to fast search for a partial rule and exclude the larg portion of negative examples, the other algorithm incorporates the more optimized greedy set-covering algorithm, and runs on a small portion of training examples. Hence,the training process of FCV1 is much faster than that of AQ15.
基金Project supported by the Australian Research Council (Grant No. DP0559807)the National Natural Science Foundation of China (Grant No. 11071282)+5 种基金the Science Fund for Changjiang Scholars and Innovative Research Team in University (PCSIRT)(Grant No. IRT1179)the Program for New Century Excellent Talents in University (Grant No. NCET-08-06867)the Research Foundation of the Education Department of Hunan Province of China (Grant No. 11A122)the Natural Science Foundationof Hunan Province of China (Grant No. 10JJ7001)the Science and Technology Planning Project of Hunan Province of China(Grant No. 2011FJ2011)the Lotus Scholars Program of Hunan Province of China,the Aid Program for Science and Technology Innovative Research Team in Higher Education Institutions of Hunan Province of China,and a China Scholarship Council-Queensland University of Technology Joint Scholarship
文摘Complex networks have recently attracted much attention in diverse areas of science and technology. Many networks such as the WWW and biological networks are known to display spatial heterogeneity which can be characterized by their fractal dimensions. Multifractal analysis is a useful way to systematically describe the spatial heterogeneity of both theoretical and experimental fractal patterns. In this paper, we introduce a new box-covering algorithm for multifractal analysis of complex networks. This algorithm is used to calculate the generalized fractal dimensions Dq of some theoretical networks, namely scale-free networks, small world networks, and random networks, and one kind of real network, namely protein protein interaction networks of different species. Our numerical results indicate the existence of multifractality in scale-free networks and protein protein interaction networks, while the multifractal behavior is not clear-cut for small world networks and random networks. The possible variation of Dq due to changes in the parameters of the theoretical network models is also discussed.
基金the National Natural Science Foundation of China (Nos. 61073117 and 61175046)the Provincial Natural Science Research Program of Higher Education Institutions of Anhui Province (No. KJ2013A016)+1 种基金the Academic Innovative Research Projects of Anhui University Graduate Students (No. 10117700183)the 211 Project of Anhui University
文摘Mining from ambiguous data is very important in data mining. This paper discusses one of the tasks for mining from ambiguous data known as multi-instance problem. In multi-instance problem, each pattern is a labeled bag that consists of a number of unlabeled instances. A bag is negative if all instances in it are negative. A bag is positive if it has at least one positive instance. Because the instances in the positive bag are not labeled, each positive bag is an ambiguous. The mining aim is to classify unseen bags. The main idea of existing multi-instance algorithms is to find true positive instances in positive bags and convert the multi-instance problem to the supervised problem, and get the labels of test bags according to predict the labels of unknown instances. In this paper, we aim at mining the multi-instance data from another point of view, i.e., excluding the false positive instances in positive bags and predicting the label of an entire unknown bag. We propose an algorithm called Multi-Instance Covering kNN (MICkNN) for mining from multi-instance data. Briefly, constructive covering algorithm is utilized to restructure the structure of the original multi-instance data at first. Then, the kNN algorithm is applied to discriminate the false positive instances. In the test stage, we label the tested bag directly according to the similarity between the unseen bag and sphere neighbors obtained from last two steps. Experimental results demonstrate the proposed algorithm is competitive with most of the state-of-the-art multi-instance methods both in classification accuracy and running time.
文摘In today's world of excessive development in technologies, sustainability and adaptability of computer applications is a challenge, and future prediction became significant. Therefore, strong artificial intelligence (AI) became important and, thus, statistical machine learning (ML) methods were applied to serve it. These methods are very difficult to understand, and they predict the future without showing how. However, understanding of how machines make their decision is also important, especially in information system domain. Consequently, incremental covering algorithms (CA) can be used to produce simple rules to make difficult decisions. Nevertheless, even though using simple CA as the base of strong AI agent would be a novel idea but doing so with the methods available in CA is not possible. It was found that having to accurately update the discovered rules based on new information in CA is a challenge and needs extra attention. In specific, incomplete data with missing classes is inappropriately considered, whereby the speed and data size was also a concern, and future none existing classes were neglected. Consequently, this paper will introduce a novel algorithm called RULES-IT, in order to solve the problems of incremental CA and introduce it into strong AI. This algorithm is the first incremental algorithm in its family, and CA as a whole, that transfer rules of different domains to improve the performance, generalize the induction, take advantage of past experience in different domain, and make the learner more intelligent. It is also the first to introduce intelligent aspectsinto incremental CA, including consciousness, subjective emotions, awareness, and adjustment. Furthermore, all decisions made can be understood due to the simple representation of repository as rules. Finally, RULES-IT performance will be benchmarked with six different methods and compared with its predecessors to see the effect of transferring rules in the learning process, and to prove how RULES-IT actually solved the shortcoming of current incremental CA in addition to its improvement in the total performance.