Active learning(AL)trains a high-precision predictor model from small numbers of labeled data by iteratively annotating the most valuable data sample from an unlabeled data pool with a class label throughout the learn...Active learning(AL)trains a high-precision predictor model from small numbers of labeled data by iteratively annotating the most valuable data sample from an unlabeled data pool with a class label throughout the learning process.However,most current AL methods start with the premise that the labels queried at AL rounds must be free of ambiguity,which may be unrealistic in some real-world applications where only a set of candidate labels can be obtained for selected data.Besides,most of the existing AL algorithms only consider the case of centralized processing,which necessitates gathering together all the unlabeled data in one fusion center for selection.Considering that data are collected/stored at different nodes over a network in many real-world scenarios,distributed processing is chosen here.In this paper,the issue of distributed classification of partially labeled(PL)data obtained by a fully decentralized AL method is focused on,and a distributed active partial label learning(dAPLL)algorithm is proposed.Our proposed algorithm is composed of a fully decentralized sample selection strategy and a distributed partial label learning(PLL)algorithm.During the sample selection process,both the uncertainty and representativeness of the data are measured based on the global cluster centers obtained by a distributed clustering method,and the valuable samples are chosen in turn.Meanwhile,using the disambiguation-free strategy,a series of binary classification problems can be constructed,and the corresponding cost-sensitive classifiers can be cooperatively trained in a distributed manner.The experiment results conducted on several datasets demonstrate that the performance of the dAPLL algorithm is comparable to that of the corresponding centralized method and is superior to the existing active PLL(APLL)method in different parameter configurations.Besides,our proposed algorithm outperforms several current PLL methods using the random selection strategy,especially when only small amounts of data are selected to be assigned with the candidate labels.展开更多
Partial label learning aims to learn a multi-class classifier,where each training example corresponds to a set of candidate labels among which only one is correct.Most studies in the label space have only focused on t...Partial label learning aims to learn a multi-class classifier,where each training example corresponds to a set of candidate labels among which only one is correct.Most studies in the label space have only focused on the difference between candidate labels and non-candidate labels.So far,however,there has been little discussion about the label correlation in the partial label learning.This paper begins with a research on the label correlation,followed by the establishment of a unified framework that integrates the label correlation,the adaptive graph,and the semantic difference maximization criterion.This work generates fresh insight into the acquisition of the learning information from the label space.Specifically,the label correlation is calculated from the candidate label set and is utilized to obtain the similarity of each pair of instances in the label space.After that,the labeling confidence for each instance is updated by the smoothness assumption that two instances should be similar outputs in the label space if they are close in the feature space.At last,an effective optimization program is utilized to solve the unified framework.Extensive experiments on artificial and real-world data sets indicate the superiority of our proposed method to state-of-art partial label learning methods.展开更多
Complementary-label learning(CLL)aims at finding a classifier via samples with complementary labels.Such data is considered to contain less information than ordinary-label samples.The transition matrix between the tru...Complementary-label learning(CLL)aims at finding a classifier via samples with complementary labels.Such data is considered to contain less information than ordinary-label samples.The transition matrix between the true label and the complementary label,and some loss functions have been developed to handle this problem.In this paper,we show that CLL can be transformed into ordinary classification under some mild conditions,which indicates that the complementary labels can supply enough information in most cases.As an example,an extensive misclassification error analysis was performed for the Kernel Ridge Regression(KRR)method applied to multiple complementary-label learning(MCLL),which demonstrates its superior performance compared to existing approaches.展开更多
Partial label learning is a weakly supervised learning framework in which each instance is associated with multiple candidate labels,among which only one is the ground-truth label.This paper proposes a unified formula...Partial label learning is a weakly supervised learning framework in which each instance is associated with multiple candidate labels,among which only one is the ground-truth label.This paper proposes a unified formulation that employs proper label constraints for training models while simultaneously performing pseudo-labeling.Unlike existing partial label learning approaches that only leverage similarities in the feature space without utilizing label constraints,our pseudo-labeling process leverages similarities and differences in the feature space using the same candidate label constraints and then disambiguates noise labels.Extensive experiments on artificial and real-world partial label datasets show that our approach significantly outperforms state-of-the-art counterparts on classification prediction.展开更多
基金supported by the National Natural Science Foundation of China(62201398)Natural Science Foundation of Zhejiang Province(LY21F020001),Science and Technology Plan Project of Wenzhou(ZG2020026).
文摘Active learning(AL)trains a high-precision predictor model from small numbers of labeled data by iteratively annotating the most valuable data sample from an unlabeled data pool with a class label throughout the learning process.However,most current AL methods start with the premise that the labels queried at AL rounds must be free of ambiguity,which may be unrealistic in some real-world applications where only a set of candidate labels can be obtained for selected data.Besides,most of the existing AL algorithms only consider the case of centralized processing,which necessitates gathering together all the unlabeled data in one fusion center for selection.Considering that data are collected/stored at different nodes over a network in many real-world scenarios,distributed processing is chosen here.In this paper,the issue of distributed classification of partially labeled(PL)data obtained by a fully decentralized AL method is focused on,and a distributed active partial label learning(dAPLL)algorithm is proposed.Our proposed algorithm is composed of a fully decentralized sample selection strategy and a distributed partial label learning(PLL)algorithm.During the sample selection process,both the uncertainty and representativeness of the data are measured based on the global cluster centers obtained by a distributed clustering method,and the valuable samples are chosen in turn.Meanwhile,using the disambiguation-free strategy,a series of binary classification problems can be constructed,and the corresponding cost-sensitive classifiers can be cooperatively trained in a distributed manner.The experiment results conducted on several datasets demonstrate that the performance of the dAPLL algorithm is comparable to that of the corresponding centralized method and is superior to the existing active PLL(APLL)method in different parameter configurations.Besides,our proposed algorithm outperforms several current PLL methods using the random selection strategy,especially when only small amounts of data are selected to be assigned with the candidate labels.
基金supported by the National Natural Science Foundation of China(62176197,61806155)the National Natural Science Foundation of Shaanxi Province(2020GY-062).
文摘Partial label learning aims to learn a multi-class classifier,where each training example corresponds to a set of candidate labels among which only one is correct.Most studies in the label space have only focused on the difference between candidate labels and non-candidate labels.So far,however,there has been little discussion about the label correlation in the partial label learning.This paper begins with a research on the label correlation,followed by the establishment of a unified framework that integrates the label correlation,the adaptive graph,and the semantic difference maximization criterion.This work generates fresh insight into the acquisition of the learning information from the label space.Specifically,the label correlation is calculated from the candidate label set and is utilized to obtain the similarity of each pair of instances in the label space.After that,the labeling confidence for each instance is updated by the smoothness assumption that two instances should be similar outputs in the label space if they are close in the feature space.At last,an effective optimization program is utilized to solve the unified framework.Extensive experiments on artificial and real-world data sets indicate the superiority of our proposed method to state-of-art partial label learning methods.
基金Supported by the Indigenous Innovation’s Capability Development Program of Huizhou University(HZU202003,HZU202020)Natural Science Foundation of Guangdong Province(2022A1515011463)+2 种基金the Project of Educational Commission of Guangdong Province(2023ZDZX1025)National Natural Science Foundation of China(12271473)Guangdong Province’s 2023 Education Science Planning Project(Higher Education Special Project)(2023GXJK505)。
文摘Complementary-label learning(CLL)aims at finding a classifier via samples with complementary labels.Such data is considered to contain less information than ordinary-label samples.The transition matrix between the true label and the complementary label,and some loss functions have been developed to handle this problem.In this paper,we show that CLL can be transformed into ordinary classification under some mild conditions,which indicates that the complementary labels can supply enough information in most cases.As an example,an extensive misclassification error analysis was performed for the Kernel Ridge Regression(KRR)method applied to multiple complementary-label learning(MCLL),which demonstrates its superior performance compared to existing approaches.
基金supported by the National Key Research&Develop Plan of China under Grant Nos.2017YFB1400700 and 2018YFB1004401the National Natural Science Foundation of China under Grant Nos.61732006,61702522,61772536,61772537,62076245,and 62072460Beijing Natural Science Foundation under Grant No.4212022。
文摘Partial label learning is a weakly supervised learning framework in which each instance is associated with multiple candidate labels,among which only one is the ground-truth label.This paper proposes a unified formulation that employs proper label constraints for training models while simultaneously performing pseudo-labeling.Unlike existing partial label learning approaches that only leverage similarities in the feature space without utilizing label constraints,our pseudo-labeling process leverages similarities and differences in the feature space using the same candidate label constraints and then disambiguates noise labels.Extensive experiments on artificial and real-world partial label datasets show that our approach significantly outperforms state-of-the-art counterparts on classification prediction.