期刊文献+

基于在线属性聚合的海量软件层次分类

Hierarchical Software Categorization Based on Aggregation of Online Attributes
下载PDF
导出
摘要 互联网规模的软件资源库正从根本上改变传统的软件开发模式,资源库中海量软件的高效层次分类对基于互联网资源的软件开发具有重要意义.传统软件分类方法基于软件源代码或字节码实现粗粒度的扁平分类,并且只在小规模数据集上进行了验证.文中提出了一种基于软件在线属性聚合的层次分类方法,设计了一个层次分类框架,基于跨资源库软件在线描述和标签的加权聚合,实现对海量软件的高效层次化分类.文中在超过18 000个开源软件上进行交叉验证,实验结果表明文中提出的在线属性加权聚合方法能显著提高软件分类效果.在粗粒度扁平分类下文中方法能够达到基于源代码/字节码分类近似的性能,而且,与相关工作比较,文中方法实现了涵盖123个更细粒度类别的层次化分类,能够更有效地对海量软件进行分类. The Internet-scale software repositories are fundamentally changing the paradigms of software development. Efficient categorization of the massive software these repositories is of vital importance for Internet-based software development. traditional projects in Traditional classification approaches do coarse-grained and flat categorization by analyzing source code or byte code, and most of them are only verified on relatively small collections of software projects. In this paper, we propose an efficient hierarchical categorization approach based on the aggregation of the software online attributes and design a hierarchical categorization framework. Based on the weighted aggregation of software descriptions and tags across multiple repositories, we cate- gorize the massive software hierarchically. Extensive experiments are carried out on more than 18,000 software projects. The results show that significant improvement can be achieved by using weighted aggregation of different online attributes. Compared to the previous work, our approach achieves/gains competitive performance with 123 hierarchical and finer-grained categories for which classification is much harder. In contrast to those using source code or byte code, our approach is more effective for large-scale categorization.
出处 《计算机学报》 EI CSCD 北大核心 2013年第10期2007-2018,共12页 Chinese Journal of Computers
基金 国家"八六三"高技术研究发展规划项目基金(2012AA011201) 国家自然科学基金(60903043)资助~~
关键词 软件资源库 开源软件 层次分类 在线属性 software repository open source software hierarchical categorization online attribute
  • 相关文献

参考文献23

  • 1McMillan C. Searching, selecting, and synthesizing source code//Proceedings of the 33rd International Conference on Software Engineering. Hawaii, USA, 2011:1124-1125.
  • 2Dumitru H, Gibiec M, Hariri N, et al. On-demand feature recommendations derived from mining public product descrip- tions//Proceedings of the 33rd International Conference on in Software Engineering. Hawaii, USA, 2011:181-190.
  • 3Kobayashi K, Kamimura M, Kato K, et al. Feature-gathering dependency-based software clustering using dedication and modularity//Proceedings of the 28th IEEE International Con- ference on Software Maintenance. Trento, Italy, 2012: 462- 471.
  • 4Teyton C, Falleri J-R, Blanc X. Mining library migration graphs//Proceedings of the 19th Working Conference on in Reverse Engineering. Ontario, Canada, 2012:289-298.
  • 5Zimmermann T, Nagappan N, Gall H, et al. Cross-project defect prediction A large scale experiment on data vs. domain vs. process//Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engi- neering (ESEC/FSE09). Amsterdam, the Netherlands, 2009: 91-100.
  • 6Surian D, Liu N, Lo D, et al. Recommending people in developers collaboration network//Proceedings of the 18th Working Conference on Reverse Engineering (WCRE). Lim- erick, Ireland, 2011:379-388.
  • 7Kuang D, Li X, Ling C X. A new search engine integrating hierarchical browsing and keyword search//Proceedings of the Twenty Second International Joint Conference on Artifi- cial Intelligence-Volume Volume Three. Catalonia, Spain, 2011:2464-2469.
  • 8Wang T, Yin G, Li X, Wang H. Labeled topic detection of open source software from mining mass textual project profiles//Proceedings of the ACM SIGKDD Workshop on Software Mining ( SoftwareMining 12 ). Beijing, China, 2012:17-24.
  • 9Kawaguchi S, Garg P, Matsushita M, Inoue K. Mudablue: An automatic categorization system for open source repositories. Journal of Systems and Software, 2006, 79(7) : 939-953.
  • 10Tian K, Revelle M, Poshyvanyk D. Using latent Dirichlet allocation for automatic categorization of software//Proceedingsof the 6th IEEE International Working Conference on Mining Software Repositories. Vancouver, Canada, 2009:163-166.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部