期刊文献+

基于采样策略的主动学习算法研究进展 被引量:33

Advances in Active Learning Algorithms Based on Sampling Strategy
下载PDF
导出
摘要 主动学习算法通过选择信息含量大的未标记样例交由专家进行标记,多次循环使分类器的正确率逐步提高,进而在标记总代价最小的情况下获得分类器的强泛化能力,这一技术引起了国内外研究人员的关注.侧重从采样策略的角度,详细介绍了主动学习中学习引擎和采样引擎的工作过程,总结了主动学习算法的理论研究成果,详细评述了主动学习的研究现状和发展动态.首先,针对采样策略选择样例的不同方式将主动学习算法划分为不同类型,进而,对基于不同采样策略的主动学习算法进行了深入地分析和比较,讨论了各种算法适用的应用领域及其优缺点.最后指出了存在的开放性问题和进一步的研究方向. The classifier in active learning algorithms is trained by choosing the most informative unlabeled instances for human experts to label. In the cycling procedure, the classification accuracy of the model is improved, and then the classifier with high generalization capability is obtained by minimizing the totally labeling cost. Active learning has attracted attentions of researchers both at home and abroad widely. It is pointed out that the active learning technique is a very important research at present. In this paper, the active learning algorithms are introduced by putting a particular emphasis on the sampling strategies. The iterative processes of the learning engine and the sampling engine are described in detail. The existing theories of active learning are summarized. The recent work and the development of active learning are discussed, including their approaches and corresponding sampling strategies. Firstly, the active learning algorithms are categorized into three main classes according to different ways of selecting the examples. And then, the sampling strategies are summarized by analyzing their correlations. The advantages and the shortcomings of sampling strategies are discussed and compared deeply within real applications. Finally the open problems which are still remained, and the interests of active learning in future research are forecasted.
出处 《计算机研究与发展》 EI CSCD 北大核心 2012年第6期1162-1173,共12页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61171185 60932008 60832010) 中国博士后科学基金特别资助项目(201003446)
关键词 机器学习 主动学习 采样策略 标记代价 样例选择 machine learning active learning sampling strategy labeling cost instances selection
  • 相关文献

参考文献84

  • 1Zhu Xiaojin. Semi-supervised learning literature survey, TR1530 [R]. Madison, Wisconsin: Computer Sciences, University of Wisconsin-Madison, 2005.
  • 2Tomanek K, Olsson F. A Web survey on the use of active learning to support annotation of text data [C] //Proc of HLT-NAACL. Stroudsburg, PA: ACL, 2009: 45-48.
  • 3Settles B. Active learning literature survey, TR1648 [R]. Madison, Wisconsin: Computer Sciences, University of Wisconsin-Madison, 2009.
  • 4Guyon I, Cawley G, Dror G, et al. Design and analysis of the WCCI 2010 active learning challenge [C] //Proc of IEEE/ INNS IJCNN 2010. Piscataway, NJ: IEEE, 2010:1-8.
  • 5Angluin D. Queries and concept learning [J]. Machine Learning, 1988, 2(4): 319-342.
  • 6Dasgupta S, Langford J. A tutorial on active learning [EB/ OL]. (2009-06-04) [-2010-07-29]. http://hunch, net/- active_learning/.
  • 7Wu Yi, Kozintsev I, Bouguet J Y, et al. Sampling strategies for active learning in personal photo retrieval [C] //Proc of ICME 2006. Piscataway, NJ: IEEE, 2006:529-532.
  • 8Baum E B, Lang K. Query learning can work poorly when a human oracle is used [C] //Proc of IEEE IJCNN 1992. Piscataway, NJ: IEEE, 1992:335-340.
  • 9Cohn D, Atlas L, Ladner R. Improving generalization with active learning [J]. Machine Learning, 1994, 15(2): 201- 221.
  • 10Cohn D, Atlas L, Ladner R. Improving generalization with active learning[J]. Machine Learning, 1994, 15(2): 201- 221.

二级参考文献15

  • 1史忠植.知识发现[M].北京:清华大学出版社,2000..
  • 2M Seeger, Learning with labeled and unlabeled data [R]. Edinburgh University, Tech Rep, 2001.
  • 3D D Lewis, W A Gale. A sequential algorithm for training text classifiers [C]. In: Proc of the 17th ACM Int'l Conf on Research and Development in Information Retrieval. Berlin: Springer, 1994.
  • 4H S Seung, M Opper, H Sompolinsky. Query by committee [C]. The 5th Workshop on Computational Learning Theory, San Mateo, CA, 1992.
  • 5H T Nguyen, A Smeulders. Active learning using pre-clustering [C]. The 21th Int'l Conf on Machine Learning, Banff, CA, 2004.
  • 6S Tong, D Koller. Support vector machine active learning with applications to text classification [J]. Journal of Machine Learning Research, 2001, 2:45-66.
  • 7G Schohn, D Cohn. Leas is more: Active learning with support vector machines [C]. In: Proc of the 17th Int'l Conf on Machine Learning. San Francisco: Morgan Kaufmann, 2000.
  • 8C Campbell, N Cristianini, A Smola. Query learning with large margin classifiers [C]. In: Proc of the 17th lnt'l Conf on Machine Learning. San Francisco: Morgan Kaufmann, 2000.
  • 9D A Cohn, Z Ghahramani, M I Jordan. Active learning with statistical models [J ]. Journal of Artificial Intelligence research, 1996, 4:129-145.
  • 10N Roy, A McCallum. Toward optimal active learning through sampling estimation of error [C]. The 18th Int'l Conf on Machine Learning, San Francisco, CA, 2001.

共引文献50

同被引文献331

引证文献33

二级引证文献114

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部