改进的软件错误报告自动分类算法

Improved Automatic Classification Algorithm of Software Bug Report

下载PDF

导出

摘要软件错误报告的自动分类能够节省大量人力和时间,然而用户提交的错误报告主观性较强,对错误报告的描述较随意,造成自动分类的效率低下。为此,基于传统的词频-逆向文件频率(TF-IDF)算法,结合文档内词条频度与词条在同类别及不同类别文档中的分布情况,提出2种特征降维的改进算法,降维后再对词条进行权值处理,进一步提高特征降维的效果。实验结果表明,应用该算法得到的错误报告自动分类在精确率、召回率、F1值和准确度等指标上比现有算法都有明显提高。 Automatic classification of software bug reports save a large number of time and human resources. However, the bug reports submitted by users have a strong subjectivity, with casual text descriptions. This results in ineffective classification. Two improved algorithms are proposed to reduce feature dimensions in classifying bug reports from their text descriptions. These two algorithms are based on the traditional Term Frequency-Inverse Document Frequency （ TF-IDF） algorithm, combined with the term frequency in documentations and the distribution of the term in the same category and different types of categories. One weight processing is used after feature dimension reduction in order to get a better result. Experimental results indicate that the proposed algorithm has better performance in term of precision, recall,F1 score,and accuracy than the current algorithms.

作者黄伟林劼江育娥江秉华

机构地区福建师范大学软件学院南京医科大学病理学系

出处《计算机工程》 CAS CSCD 北大核心 2015年第6期183-187,共5页 Computer Engineering

基金国家自然科学重大国际(地区)合作研究基金资助项目(81320108019) 福建省自然科学基金资助项目(2014J01220)

关键词特征降维错误报告文本自动分类词频-逆向文件频率特征权重频率 feature dimension reduction bug report text automatic classification Term Frequency-Inverse DocumentFrequency （ TF-IDF ） feature weight frequency

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献17

1Jeong G,Kim S,Zimmermann T.Improving Bug Triage with Bug Tossing Graphs[C]//Proceedings of the7th Joint Meeting of European Software Engineering Conference and ACM SIGSOFT Symposium on Foundations of Software Engineering.New York,USA:ACM Press,2009:111-120.
2Anvik J,Hiew L,Murphy G C.Who Should Fix This Bug?[C]//Proceedings of the28th International Conference on Software Engineering.New York,USA:ACM Press,2006:361-370.
3Xuan Jifeng,Jiang He,Ren Zhiwei,et al.Automatic Bug Triage Using Semi-supervised Text Classification[C]//Proceedings of the22nd International Conference on Software Engineering&Knowledge Engineering.Washington D.C.,USA:IEEE Press,2010:209-214.
4Bettenburg N,Just S,Schroter A,et al.What Makes a Good Bug Report?[C]//Proceedings of the16th ACM SIGSOFT International Symposium on Foundations of Software Engineering.New York,USA:ACM Press,2008:308-318.
5Strate J D,Laplante P A.A Literature Review of Research in Software Defect Reporting[J].IEEE Transactions on Reliability,2013,62(2):444-454.
6ǒubraniòD.Automatic Bug Triage Using Text Categorization[C]//Proceedings of the16th International Conference on Software Engineering&Knowledge Engineering.Berlin,Germany:Springer,2004:201-236.
7Matter D,Kuhn A,Nierstrasz O.Assigning Bug Reports Using a Vocabulary-based Expertise Model of Developers[C]//Proceedings of the6th IEEE International Working Conference on Mining Software Repositories.Washington D.C.,USA:IEEE Press,2009:131-140.
8Alenezi M,Magel K,Banitaan S.Efficient Bug Triaging Using Text Mining[C]//Proceedings of the16th International Conference on Software Engineering.Washington D.C.,USA:IEEE Press,2004:92-97.
9Shokripour R,Kasirun Z M,Zamani S,et al.Automatic Bug Assignment Using Information Extraction Methods[C]//Proceedings of International Conference on Advanced Computer Science Applications and Technologies.Washington D.C.,USA:IEEE Press,2012:144-149.
10Shokripour R,Anvik J,Kasirun Z M,et al.Why so Complicated?Simple Term Filtering and Weighting for Location-based Bug Report Assignment Recommendation[C]//Proceedings of the10th International Workshop on Mining Software Repositories.Washington D.C.,USA:IEEE Press,2013:2-11.

二级参考文献49

1唐焕玲,孙建涛,陆玉昌.文本分类中结合评估函数的TEF-WA权值调整技术[J].计算机研究与发展,2005,42(1):47-53. 被引量：26
2李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量：95
3罗欣,夏德麟,晏蒲柳.基于词频差异的特征选取及改进的TF-IDF公式[J].计算机应用,2005,25(9):2031-2033. 被引量：55
4陈涛,谢阳群.文本分类中的特征降维方法综述[J].情报学报,2005,24(6):690-695. 被引量：79
5柴玉梅,王宇.基于TFIDF的文本特征选择方法[J].微计算机信息,2006,22(08X):24-26. 被引量：32
6苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量：387
7张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006,32(19):76-78. 被引量：121
8Rocchio J.The SMART Retrieval System:Experiments in Automatic Document Processing[M].Englewood Cliffs,USA:Prentice-Hall,1971.
9Salton G,Buckley C.Term Weighting Approaches in Automatic Text Retrieval[J].Information Processing and Management,1988,24(5):513-523.
10Salton G.Developments in Automatic Text Retrieval[J].Science,1991,253(5023):974-979.

共引文献232

1杨一,邹昀瑾.以机器学习应对信息“爆炸”时代:公共管理研究的降维可视化探析[J].中国行政管理,2021(1):105-113. 被引量：15
2田栩冉,马笑笑,李玉海.我国文献资源保障体系论文主题识别与演化分析[J].知识管理论坛,2021(6):303-314.
3李秀茹,王晓,李朋朋,李绪红,罗安.Word2vec和支持向量机的POI自动分类方法[J].测绘科学,2022,47(6):195-203. 被引量：5
4LI Yanling,DAI Guanzhong,ZHU Yehang,QIN Sen.A High-Performance Extraction Method for Public Opinion on Internet[J].Wuhan University Journal of Natural Sciences,2007,12(5):902-906. 被引量：3
5刘海峰,王元元,张学仁.文本分类中一种改进的特征选择方法[J].情报科学,2007,25(10):1534-1537. 被引量：9
6贾美娟,李娟.基于分级匹配的信息过滤研究[J].大庆师范学院学报,2007,27(5):14-17. 被引量：1
7周炎涛,唐剑波,王家琴.基于信息熵的改进TFIDF特征选择算法[J].计算机工程与应用,2007,43(35):156-158. 被引量：28
8李艳玲,戴冠中,朱烨行.基于类别空间模型的文本倾向性分类方法[J].计算机应用,2007,27(9):2194-2196. 被引量：12
9王美方,刘培玉,朱振方.基于TFIDF的特征选择方法[J].计算机工程与设计,2007,28(23):5795-5796. 被引量：23
10李艳玲,戴冠中,覃森.快速的文本倾向性分类方法(英文)[J].电子科技大学学报,2007,36(6):1232-1236. 被引量：2

1赵文涛,孟令军,赵好好,韩炳权,成亚飞.分布式朴素贝叶斯算法在文本分类中的应用[J].测控技术,2016,35(6):50-55. 被引量：2
2胡改蝶,樊孝仁,崔艺馨.文本分类中基于改进特征选择方法的研究[J].计算机与数字工程,2016,45(7):1290-1292. 被引量：1
3代宽,赵辉,韩冬,宋天勇.基于向量空间模型的中文网页主题特征项抽取[J].吉林大学学报（信息科学版）,2014,32(1):88-94. 被引量：10
4周红卫,周宏印.基于向量空间用户兴趣模型的态势情报信息分发机制[J].指挥信息系统与技术,2015,6(6):90-95. 被引量：7
5杨明,康南南,赵玉芳.基于彩色描述子和主题模型的新闻标注[J].西南大学学报（自然科学版）,2014,36(9):194-200.
6公帅,熊锦华,刘志勇.面向最终用户的组合服务推荐[J].计算机集成制造系统,2013,19(8):1876-1882. 被引量：1
7吴不晓,肖菁.基于用户标注行为的潜在好友推荐[J].计算机应用,2015,35(6):1663-1667. 被引量：3
8张海东,倪晚成,赵美静,杨一平.面向基础教育阶段的教学资源推荐系统[J].计算机应用,2014,34(11):3353-3356. 被引量：6
9李永锋,张国良,王蜂,汤文俊,姚二亮.一种基于历史模型集的改进闭环检测算法[J].机器人,2015,37(6):663-673. 被引量：8
10卢兆麟,李升波,Schroeder Felix,周吉晨,成波.结合自然语言处理与改进层次分析法的乘用车驾驶舒适性评价[J].清华大学学报（自然科学版）,2016,56(2):137-143. 被引量：18

计算机工程

2015年第6期

浏览历史

内容加载中请稍等...

改进的软件错误报告自动分类算法

参考文献17

二级参考文献49

共引文献232

相关作者

相关机构

相关主题

浏览历史