Analysis of Semi-Supervised Text Clustering Algorithm on Marine Data

下载PDF

导出

摘要 Semi-supervised clustering improves learning performance as long as it uses a small number of labeled samples to assist un-tagged samples for learning.This paper implements and compares unsupervised and semi-supervised clustering analysis of BOA-Argo ocean text data.Unsupervised K-Means and Affinity Propagation(AP)are two classical clustering algorithms.The Election-AP algorithm is proposed to handle the final cluster number in AP clustering as it has proved to be difficult to control in a suitable range.Semi-supervised samples thermocline data in the BOA-Argo dataset according to the thermocline standard definition,and use this data for semi-supervised cluster analysis.Several semi-supervised clustering algorithms were chosen for comparison of learning performance:Constrained-K-Means,Seeded-K-Means,SAP(Semi-supervised Affinity Propagation),LSAP(Loose Seed AP)and CSAP(Compact Seed AP).In order to adapt the single label,this paper improves the above algorithms to SCKM(improved Constrained-K-Means),SSKM(improved Seeded-K-Means),and SSAP(improved Semi-supervised Affinity Propagationg)to perform semi-supervised clustering analysis on the data.A DSAP(Double Seed AP)semi-supervised clustering algorithm based on compact seeds is proposed as the experimental data shows that DSAP has a better clustering effect.The unsupervised and semi-supervised clustering results are used to analyze the potential patterns of marine data.

作者 Yu Jiang Dengwen Yu Mingzhao Zhao Hongtao Bai Chong Wang Lili He

机构地区 College of Computer Science and Technology A Key Laboratory of Symbolic Computation and Knowledge Engineering Department of Engineering Mechanics

出处《Computers, Materials & Continua》 SCIE EI 2020年第7期207-216,共10页 计算机、材料和连续体（英文）

基金 This work was supported in part by the National Natural Science Foundation of China(51679105,61872160,51809112) “Thirteenth Five Plan”Science and Technology Project of Education Department,Jilin Province(JJKH20200990KJ).

关键词 Unsupervised learning semi-supervised learning text clustering

分类号 TP3 [自动化与计算机技术—计算机科学与技术]

引文网络
相关文献

参考文献2

1王开军,李健,张军英,涂重阳.半监督的仿射传播聚类[J].计算机工程,2007,33(23):197-198. 被引量：29
2肖宇,于剑.基于近邻传播算法的半监督聚类[J].软件学报,2008,19(11):2803-2813. 被引量：165

二级参考文献5

1Frey B J, Dueck D. Clustering by Passing Messages Between Data Points, Science[EB/OL]. (2007-02). http://www.psi.toronto.ed u/affinitypropagation/FreyDueckScience07.pdf.
2Kelly K. Affinity Program Slashes Computing Times[EB/OL]. (2007-02-15). http://www.news.utoronto.ca/bin6/070215-2952.asp.
3Wang K. Supplementary Information[EB/OL]. (2007-03). http://w w w.mathwork s.cona/matlabcentral/fileexchange/loadAuthor.do?obj ect Type=author&objectld= 1095267.
4Dudoit S, Fridlyand J. A Prediction-based Resampling Method for Estimating the Number of Clusters in a Dataset[EB/OL]. (2002-03). http://www.edlab.cs.um ass.edu/cs691 k/conlon/readings/Dudoit Fridlyand2002GB.pdf.
5王玲,薄列峰,焦李成.密度敏感的半监督谱聚类[J].软件学报,2007,18(10):2412-2422. 被引量：94

共引文献185

1常瑞花.基于密集度量元的近邻传播聚类算法[J].微电子学与计算机,2015,32(5):1-5. 被引量：1
2代松,李伟生.基于亲和传递聚类的多类物体识别方法[J].计算机工程,2009,35(14):206-208. 被引量：2
3茅赵阳.图像的聚类和可视化方法研究[J].现代计算机,2009,15(7):71-73. 被引量：1
4李昆仑,曹铮,曹丽苹,张超,刘明.半监督聚类的若干新进展[J].模式识别与人工智能,2009,22(5):735-742. 被引量：50
5梁吉业,高嘉伟,常瑜.半监督学习研究进展[J].山西大学学报（自然科学版）,2009,32(4):528-534. 被引量：32
6郝建柏,陈贤富,黄双福,杨俊.一种基于模糊近邻标签传递的半监督分类算法[J].微电子学与计算机,2010,27(2):30-33. 被引量：6
7郭景峰,马鑫,代军丽.基于文本—链接模型和近邻传播算法的网页聚类[J].计算机应用研究,2010,27(4):1255-1258. 被引量：3
8何海江,何文德,刘华富.集成最近邻规则的半监督顺序回归算法[J].计算机应用,2010,30(4):1022-1025. 被引量：1
9许文竹,徐立鸿.基于聚类的镜头边界检测算法[J].计算机工程,2010,36(9):230-231. 被引量：2
10周世兵,徐振源,唐旭清.新的K-均值算法最佳聚类数确定方法[J].计算机工程与应用,2010,46(16):27-31. 被引量：91

1Gaoxiang Zhou,Ming Liu,Xiangnan Liu.An autoencoder-based model for forest disturbance detection using Landsat time series data[J].International Journal of Digital Earth,2021,14(9):1087-1102. 被引量：1
2Dick M.A.Schaap,Roy K.Lowry.SeaDataNet-Pan-European infrastructure for marine and ocean data management:unified access to distributed data sets[J].International Journal of Digital Earth,2010,3(S01):50-69. 被引量：2
3Sarah Flynn,Will Meaney,Adam M.Leadbetter,Jeffrey P.Fisher,Caitriona Nic Aonghusa.Lessons from a Marine Spatial Planning data management process for Ireland[J].International Journal of Digital Earth,2021,14(2):139-157.
4Jian Fang,Fan Yang,Rui Tong,Qin Yu,Xiaofeng Dai.Fault diagnosis of electric transformers based on infrared image processing and semi-supervised learning[J].Global Energy Interconnection,2021,4(6):596-607. 被引量：5
5Wei Cui,Chaojie Zhou,Jie Zhang,Jungang Yang.Statistical characteristics and thermohaline properties of mesoscale eddies in the Bay of Bengal[J].Acta Oceanologica Sinica,2021,40(4):10-22. 被引量：1
6ZANG Nan,WANG Fan,SPRINTALL Janet.The intermediate water in the Philippine Sea[J].Journal of Oceanology and Limnology,2020,38(5):1343-1353. 被引量：2
7Jiankun Wang,Tianyi Zhang,Nachuan Ma,Zhaoting Li,Han Ma,Fei Meng,Max Q.-H.Meng.A survey of learning-based robot motion planning[J].IET Cyber-Systems and Robotics,2021,3(4):302-314. 被引量：1
8《生命科学仪器》征稿启事[J].生命科学仪器,2021,19(4):77-77.
9Tong Li,Shibin Zhang,Jinyue Xia.Quantum Generative Adversarial Network: A Survey[J].Computers, Materials & Continua,2020(7):401-438. 被引量：2
10Haiwen Chen,Guang Yu,Fang Liu,Zhiping Cai,Anfeng Liu,Shuhui Chen,Hongbin Huang,Chak Fong Cheang.Unsupervised Anomaly Detection via DBSCAN for KPIs Jitters in Network Managements[J].Computers, Materials & Continua,2020(2):917-927.

Computers, Materials & Continua

2020年第7期

浏览历史

内容加载中请稍等...

Analysis of Semi-Supervised Text Clustering Algorithm on Marine Data

参考文献2

二级参考文献5

共引文献185

相关作者

相关机构

相关主题

浏览历史