期刊文献+

基于特征提取的开源社区Fork摘要自动生成方法

Approach of Automatic Fork Summary Generation in Open Source Community Based on Feature Extraction
下载PDF
导出
摘要 当前,基于P/R的分布式协同开发已经成为开源社区中的主导软件开发方式。开发者通过Fork复制软件项目的版本库,创建自身分支,并在新建分支中进行独立开发。由于P/R协同开发模型具有开放性、透明性和并行化等特征,开发人员在Fork项目时难以掌握项目的Fork概况,不知道其他开发人员是否已通过Fork开展相同或类似的开发工作,从而容易产生重复性的贡献和冗余性开发。针对这个问题,提出一种Fork摘要的自动生成方法以帮助项目管理者加强项目管控,避免冗余贡献,增强合作交流。该方法首先爬取开源社区中具有Feature和Bug标签信息的Issue数据,采用随机森林方法训练一个分类器模型,以对Fork特征进行分类;随后收集Fork分支的软件开发活动数据,采用TextRank算法生成Fork详细信息以解释Fork的主要目的;最后设计了一组组合规则及相应的算法来整合Fork的类别、特征和其他信息,以形成完整的Fork摘要。为了检验所提方法在指导分布式协同开发方面的有效性,在Github上进行了30组人工测试和60组实际案例测试。结果表明,所提方法生成的Fork摘要的准确率达到67.2%,实验中76%的项目管理者认为Fork摘要有助于更好地管理项目,加强沟通与合作。 At present,distributed collaborative development based on P/R has become the dominant software development me-thod in open source community.Because of the openness,transparency and parallelism of the software development in P/R mo-del,it is difficult for developers to obtain the complete Fork profile of the whole project,and know whether other developers have accomplished the same or similar development tasks,which are prone to duplicate contributions and redundant development.To solve this problem,this paper proposed an automatic generation method of Fork summary to help project managers strengthen project management,avoid redundant contributions,and enhance cooperation and communication among developers.The proposed method firstly crawls Issue data with feature and Bug label information in open source community,and trains a classifier model with random forest method to classify Fork features.Then,it collects the data of Fork branch’s software development activities and uses TextRank algorithm to generate detailed Fork information to explain the main purpose of Fork activity.Finally,a set of combination rules and corresponding algorithm are designed to integrate Fork’s categories,features and other information to form a complete Fork summary.In order to validate the effectiveness of the proposed method,30 groups of manual tests and 60 groups of actual live study were conducted on Github.The results show that the accuracy of Fork summary generated by this method is 67.2%.In the experiment,76%of project managers believe that Fork summary can help to better manage projects,and strengthen communication and cooperation.
作者 张超 毛新军 卢遥 ZHANG Chao;MAO Xin-jun;and LU Yao(College of Computer Science and Technology,National University of Defense Technology,Changsha 410000,China;Key Laboratory of Complex System Software Engineering,Changsha 410000,China)
出处 《计算机科学》 CSCD 北大核心 2020年第3期25-33,共9页 Computer Science
基金 国家重点研发计划项目(2018YFB1004202) NSFC(61532004)~~
关键词 开源软件 开源社区 Fork摘要 分布式开发 Opens source Open source community Fork summary Distributed cooperative development
  • 相关文献

参考文献1

二级参考文献2

共引文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部