摘要
链路预测是社交网络研究中最核心、最本质的研究问题。文章基于学术合作关系社交网络,采用多种现有的经典机器学习算法进行链路预测。针对现有监督学习算法中特征集使用不够全面的问题,抽取了三大类别的特征。针对数据高度偏斜问题,采用了欠采样的方式使模型不对主要类别过度偏斜,以此保证分类器的有效性。实验结果表明,Adaboost和多层前馈神经网络模型在精确率、召回率以及F1-measure指标上优于其他监督学习方法,而朴素贝叶斯方法在本问题上表现最差。
Link prediction is the core and essential research issue in social networks research. Based on the academic co-authorship networks, eight existing classical machine learning algorithms are used for link prediction. Three categories of features are extracted for link prediction to solve the problem that the features don’t be used comprehensively in the existing supervised learning algorithms. And the under-sampling is used for the problem of high skewness of data, to overcome the model skewness and to ensure the validity of the classifiers. Experimental results show that Adaboost and Multi-Layer Perceptron model are superior to the other six models in Precision, Recall and F1-measure. However, Naive Bayesian performs the worst.
作者
赵素芬
Zhao Sufen(School of Computer, Central China Normal University, Wuhan, Hubei 430079, China;School of Computer Science, Wuhan University)
出处
《计算机时代》
2019年第1期39-42,45,共5页
Computer Era
基金
国家自然科学基金(61170026)
国家重点研发计划(2017YFB0503700
2016YFB0501801)
国家标准研究计划(2016BZYJ-WG7-001)
华中师范大学中央高校基本科研业务费青年教师创新项目(CCNU18QN019)
关键词
社交网络
链路预测
机器学习
监督学习
数据偏斜
social networks
link prediction
machine learning
supervised learning
data skewness