摘要
Self-Training算法的性能很大程度上取决于高置信度样本的识别准确度。受DPC算法启发,利用密度峰值定义样本间的原型关系,并构造出近亲结点图这一新型数据结构。在此基础上,提出了一种近亲结点图编辑的Self Training算法(self-training algorithm with editing direct relative node graph-DRNG)。DRNG采用假设检验的方法选择高置信度样本,将其加入有标签样本集进行迭代训练。因误分的高密度样本点对Self-Training算法的分类性能影响较大,所以,DRNG综合考虑距离和密度两个方面定义了近亲结点图中割边的非对称权重,增大了高密度点的割边权重,使其落在拒绝域外的概率增加,减小了因其误分类而产生的风险。为了验证DRNG的性能,在8个基准数据集上与类似算法进行对比实验,实验结果验证了DRNG的有效性。
The performance of Self-Training algorithm largely depends on recognition accuracy of high-confidence sam-ples.Inspired by the DPC algorithm,it defines the prototype relationship between samples by density peak and constructs a new data structure named direct relative node graph.On this basis,a novel self-training algorithm with editing direct rel-ative node graph(DRNG)is proposed.DRNG employs a hypothesis test method to select high-confidence samples,and then adds them to the labeled sample set for iterative training.Because misclassified high-density sample points have a greater impact on the classification performance of the Self-Training algorithm,DRNG considers both distance and density to define the asymmetric weight of the cut edge in the direct relative node graph,which increases the cut edge weight of high-density points and the probability of high-density points falling outside the rejection domain.As a consequence,DRNG reduces the risk of high-density points being misclassified.To verify the performance of the DRNG,comparative experiments are carried out with 4 state-of-the-art algorithms on 8 benchmark datasets.The experimental results verify the effectiveness of the DRNG.
作者
刘学文
王继奎
杨正国
易纪海
李冰
聂飞平
LIU Xuewen;WANG Jikui;YANG Zhengguo;YI Jihai;LI Bing;NIE Feiping(School of Information Engineering,Lanzhou University of Finance and Economics,Lanzhou 730020,China;School of Computer Science,Center for Optical Imagery Analysis and Learning(OPTIMAL),Northwestern Polytechnical University,Xi’an 710072,China)
出处
《计算机工程与应用》
CSCD
北大核心
2022年第14期144-152,共9页
Computer Engineering and Applications
基金
国家自然科学基金面上项目(61772427)
国家自然科学基金青年基金项目(11801345)
甘肃省高等学校创新能力提升项目(2021B-145)
甘肃省自然科学基金(21JR11RA132)
兰州财经大学科研项目(Lzufe2020B-011)。
关键词
近亲结点图
半监督分类
密度峰值
自训练
direct relative node graph
semi-supervised classification
density peak
self-training