文本分类中的主动多域学习被引量：3

Multi-Domain Active Learning in Text Classification

下载PDF

导出

摘要现有主动学习主要着眼于对单个域训练方法的研究,不同域有不同的特征,同时也存在一些隐含的共性.如何从多个域中选择合适数据样本成为多域学习中减少人工标注工作量的关键.本文提出了一个新颖的主动多域学习框架,该框架充分考虑了重复信息,并可从多个域中选择合适的数据样本.该框架首先找到一个包含不同域间隐含共性的共享子空间,然后将所有数据样本分解为公共域部分和个性域部分,其中公共域部分可视为域间的重复信息,该部分在查询时需要被考虑到.最后,将主动多域学习方法与最新的主动学习方法的性能进行了比对,实验结果表明,本文提出的主动多域学习方法在减少人工标注工作量方面有显著作用. The existing active learning methods are mainly focus on training a single domain.Different domains have different characteristics,but there are some implied commonalities.Therefore,how to choose the right data samples from multiple domains becomes the key to reduce the workload of manual tagging in multi-domain learning.This paper presents a novel multi-domain active learning framework.The framework fully considered the duplicate information and selected the appropriate data samples from multiple domains.Firstly,in this framework,a sharing subspace containing implicit commonalities between different domains is found;Then,all the data samples are broken down into the individual domain portions and the public domain portions,and the public domain portions can be considered as the duplicate information between domains which needs to be considered in the query.Finally,the multi-domain active learning methods and the latest active learning methods are compared in terms of performance.The experimental results show that the proposed multi-domain active learning methods are more marked effect in reducing the workload of manual tagging.

作者赖娟金澎洪艳伟

机构地区乐山师范学院智能信息处理及应用实验室乐山师范学院计算机科学学院

出处《西南师范大学学报（自然科学版）》 CAS CSCD 北大核心 2014年第7期108-114,共7页 Journal of Southwest China Normal University(Natural Science Edition)

关键词主动学习多域学习隐含共性共享子空间 active learning multi-domain learning implicit commonalities sharing subspace

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献14

1PAN S J, NI X, SUN J T, et al, Cross-Domain Sentiment Classification Via Spectral Feature Alignment [C]//Proceed- ings of the 19th International Conference on World Wide Web. Raleigh: Association for Computer Machinery, 2010: 751-760.
2DAGAN I, ENGELSON S P. Committee-Based Sampling for Training Probabilistic Classifiers [C]. Tahoe City: Pro- ceeding for the 22th Internation Conference on Machine Learning, 1995: 150-157.
3DONMEZ P, CARBONELL J G. Proactive Learning: Cost-Sensitive Active Learning with Multiple Imperfect Oracles [C]//Proceedings of the 17th ACM conference on Information and knowledge management. Napa Valley: Association for Computer Machinery, 2008 : 619- 628.
4BEYGELZIMER A, DASGUPTA S, LANGFORD J. Importance Weighted Active Learning [C]//Proceedings of the 26th Annual International Conference on Machine Learning. Montereal: Association for Computer Machinery, 2009: 49-56.
5CEBRON N, BERTHOLD M R. Active Learning in Parallel Universes [C]//Proceedings of the 19th ACM International Conference on Information and Knowledge Management. Toronto: Association for Computer Machinery, 2010: 1621-1624.
6DREDZE M, KULESZA A, CRAMMER K. Multi-Domain Learning by Confidence-Weighted Parameter Combination [J]. Machine Learning, 2010, 79(1-2): 123-149.
7SHI X, FAN W, REN J. Actively Transfer Domain Knowledge [M]//Machine Learning and Knowledge Discovery in Databases. Berlin: Springer Berlin Heidelberg, 2008 : 342- 357.
8RAI P, SAHA A, DAUME III H, et al. Domain Adaptation Meets Active Learning [C]//Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing. Los Angeles: Association for Computational Linguistics, 2010: 27-32.
9曹雷,郭嘉丰,白露,程学旗.基于半监督话题模型的用户查询日志命名实体挖掘[J].中文信息学报,2012,26(5):26-32. 被引量：6
10HARPALE A, YANG Y. Active Learning for Multi-Task Adaptive Filtering [C]. Haifax: Proceedings of the 27th In- ternational Conference on Machine Learning (ICML-10), 2010: 431-438.

二级参考文献25

1任纪生,王作英.基于特征有序对量化表示的文本分类方法[J].清华大学学报（自然科学版）,2006,46(4):527-529. 被引量：4
2Joachims T. Text Categorization with Support Vector Machines: Learning with Many Relevant Features[C]//In Europearl Conference on Machine Learning (ECML). Chemnitz, Germany: [ s. n. ], 1998:137 - 142.
3Gartner T, Flach P A. WBCsvm Weighted Bayesian Classification based on support vector machine[ C]//18th Int. Conf. on Machine Learning. WiUianstown, USA: [ s. n. ], 2001 : 154 - 161.
4Sindhawani V, Pushpak B, Subrata R. Information Theoretic Feature Crediting in Multiclass Support Vector Machine[C]// 1st SIAM Int. Conf. on Data Mining. Chicago, IL, USA: [ s. n. ] ,2001:1 - 18.
5Lewis D D, Yang Y, Rose T, et al. RCV1 : A New Benchmark Collection for Text Categorization Research[ J ]. Journal of Machine Learning Research,2004(5) :361 - 397.
6谭松波,王月粉.中文文本分类语料库-TanCorpV1.0[DB/OL].http://www.searchforum.org.cn/tansongbo/corpusl.php.
7Sebastiani E Machine Learning in Automated Text Categorization[J]. ACM Comput. Surv., 2002, 34(1): 1-47.
8Salton G. The SMART Retrieval System: Experiments in Automatic Documents Processing[M]. New York, USA: Prentice Hall, 1971.
9Marius Pasca. Weakly-supervised discovery of named entities using Web search queries[C]// Proceedings of the 16th ACM Conference on Information and Knowl- edge Management, 2007: 683-690.
10Jiafeng Guo, Gu Xu, Xueqi Cheng, et al. Named enti- ty recognition in query[C]// Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009: 267-274.

共引文献26

1郭晓,蒋宗礼.基于网页结构与链接关系的中文文本分类方法[J].现代电子技术,2010,33(22):54-56. 被引量：3
2应强,谭颖,赤鹏军.基于ANN的电力工程造价预测[J].江西电力职业技术学院学报,2011,24(4):21-23. 被引量：1
3刘文,吴陈.一种新的中文文本分类算法——One Class SVM-KNN算法[J].计算机技术与发展,2012,22(5):83-86. 被引量：4
4兰远东,邓辉舫.基于Kullback-Leibler与PCA的概率密度比值估计[J].计算机技术与发展,2012,22(6):107-110.
5闫巧,冷成朝.基于信息增益的混合垃圾邮件特征选择方法[J].计算机工程与应用,2012,48(27):90-93. 被引量：1
6胡元,石冰.基于区域划分的kNN文本快速分类算法研究[J].计算机科学,2012,39(10):182-186. 被引量：23
7刘静.基于奇异值分解的车牌特征提取方法研究与实现[J].电子设计工程,2012,20(19):8-10. 被引量：2
8焦蓬蓬,郭依正,刘丽娟,卫星.灰度共生矩阵纹理特征提取的Matlab实现[J].计算机技术与发展,2012,22(11):169-171. 被引量：57
9张培颖,王雷全.基于语义距离的文本分类方法[J].计算机技术与发展,2013,23(1):128-130. 被引量：4
10刘盼盼,李雷.SVM图像分割中最优权值组合核函数的研究[J].计算机技术与发展,2013,23(3):96-100. 被引量：2

同被引文献19

1李文斌,刘椿年,陈嶷瑛.基于特征信息增益权重的文本分类算法[J].北京工业大学学报,2006,32(5):456-460. 被引量：19
2罗远胜,王明文,勒中坚,陆旭.双语潜在语义对应分析及在跨语言文本分类中的应用研究[J].情报学报,2013,32(1):86-96. 被引量：2
3王进,金理雄,孙开伟.基于演化超网络的中文文本分类方法[J].江苏大学学报（自然科学版）,2013,34(2):196-201. 被引量：13
4熊文新.Web、语料库与双语平行语料库的建设[J].图书情报工作,2013,57(10):128-135. 被引量：8
5郭颂,马飞.文本分类中信息增益特征选择算法的改进[J].计算机应用与软件,2013,30(8):139-142. 被引量：14
6古丽娜孜.艾力木江,孙铁利,乎西旦,特列克别克.一种基于SVM-修正KNN 算法的哈萨克语文本分类[J].西北师范大学学报（自然科学版）,2014,50(3):48-53. 被引量：2
7肖业鸣,张晴晴,宋黎明,潘接林,颜永红.深度神经网络技术在汉语语音识别声学建模中的优化策略[J].重庆邮电大学学报（自然科学版）,2014,26(3):373-379. 被引量：5
8董微,刘学,倪宏.基于信息增益的自适应特征选择方法[J].计算机工程与设计,2014,35(8):2856-2859. 被引量：8
9陈翠平.基于深度信念网络的文本分类算法[J].计算机系统应用,2015,24(2):121-126. 被引量：43
10司莉,庄晓喆,贾欢.近10年来国外多语言信息组织与检索研究进展与启示[J].中国图书馆学报,2015,41(4):112-126. 被引量：11

引证文献3

1龚静,李英杰,黄欣阳.基于统计词典和特征加强的多语言文本分类[J].西南师范大学学报（自然科学版）,2018,43(9):45-50. 被引量：3
2陈波.基于循环结构的卷积神经网络文本分类方法[J].重庆邮电大学学报（自然科学版）,2018,30(5):705-710. 被引量：14
3何明.一种基于改进信息增益特征选择的最大熵模型文本分类方法[J].西南师范大学学报（自然科学版）,2019,44(3):113-118. 被引量：10

二级引证文献26

1刘云,黄荣乘.最大判别特征选择算法在文本分类的优化研究[J].四川大学学报（自然科学版）,2019,56(1):65-70. 被引量：8
2罗强,黄睿岚,朱轶.基于深度学习的粮库虫害实时监测预警系统[J].江苏大学学报（自然科学版）,2019,40(2):203-208. 被引量：11
3黄裕.DSM-Forest算法对计算机多类数据学习分类性能的影响[J].信息技术,2019,43(5):148-150. 被引量：1
4张若彬,刘嘉勇,何祥.基于BLSTM-CRF模型的安全漏洞领域命名实体识别[J].四川大学学报（自然科学版）,2019,56(3):469-475. 被引量：16
5刘礼文,俞弦.循环神经网络（RNN）及应用研究[J].科技视界,2019,0(32):54-55. 被引量：17
6杨鹤标,胡惊涛,刘芳.基于神经网络语言模型的动态层序Softmax训练算法[J].江苏大学学报（自然科学版）,2020,41(1):67-72. 被引量：4
7谢正文,柏钧献,熊熙,琚生根.基于增强问题重要性表示的答案选择算法研究[J].四川大学学报（自然科学版）,2020,57(1):66-72. 被引量：3
8徐雪娇,蒋超,刘义.运用TextCNN的零售平台商品分类[J].信息与电脑,2020,32(1):47-49.
9赵容梅,熊熙,琚生根,李中志,谢川.基于混合神经网络的中文隐式情感分析[J].四川大学学报（自然科学版）,2020,57(2):264-270. 被引量：20
10宋亚斌,邢元军,江腾宇,林辉.基于距离相关系数和KNN回归模型的森林蓄积量估测研究[J].中南林业科技大学学报,2020,40(4):22-27. 被引量：19

1曾璐,鲁海荣,罗璐,杨国亮.基于RPCA与低秩投影的有遮挡人脸识别[J].计算机仿真,2015,32(10):420-425. 被引量：7
2胡正平,赵艳霜,赵淑欢.多观测样本联合信息加权稀疏表示分类算法[J].信号处理,2014,30(4):413-421. 被引量：3
3李晓方,王子磊,奚宏生.混合SDN的自适应流量估计方法[J].计算机工程,2016,42(3):103-110. 被引量：2
4孔万增,朱善安.基于正交补空间的人脸识别[J].浙江大学学报（工学版）,2008,42(4):571-573. 被引量：3
5程龙,郭立,袁红星,陈晓琳.基于光场渲染的动态3D目标重构技术[J].中国科学院研究生院学报,2009,26(6):781-788. 被引量：2

西南师范大学学报（自然科学版）

2014年第7期

浏览历史

内容加载中请稍等...

文本分类中的主动多域学习被引量：3

参考文献14

二级参考文献25

共引文献26

同被引文献19

引证文献3

二级引证文献26

相关作者

相关机构

相关主题

浏览历史

文本分类中的主动多域学习 被引量：3

参考文献14

二级参考文献25

共引文献26

同被引文献19

引证文献3

二级引证文献26

相关作者

相关机构

相关主题

浏览历史

文本分类中的主动多域学习被引量：3