摘要
软件开发者在软件代码中如何正确使用API和API序列(APIs),是一个需要学习的困难过程.于是面对不熟悉函数库或像Github那样包含大量APIs的代码仓库,需要一些推荐工具或系统辅助开发者的APIs使用.目前我们所知最好的方法DeepApi能较好理解用户的查询语义,但基于RNN的模型存在问题:(1)没有考虑每个单词的权重;(2)将输入序列压缩为一个固定长度的向量,损失了较多有用信息;(3)句子过长会使关键信息丢失.为此,本文使用了一种基于注意力机制的模型,可以区分每个单词的重要程度并解决长查询输入所产生的长距离依赖问题.我们从Github上面爬取了649个Java开源项目,经过处理得到有114 364对注释-API序列的训练集.实验结果表明我们的方法比DeepApi方法对于BLUE指标在Top1、Top5、Top10上均能提升约20%以上.
It is a difficult process for developers to use API and API sequences(APIs)correctly in software development.When developers are faced with unfamiliar function libraries or code repositories like Github that contains a large number of APIs,they need assistance of some recommendation tools or system.To the best of our knowledge,DeepApi can better understand the semantics of user’s query,but the RNN-based model has some problems:(1)it does not consider the weight of each word,(2)the input sequence is compressed into a fixed length vector,which loses much useful information,(3)long sentences lead to loss of key information.Therefore,this study uses a model based on attention mechanism to distinguish the importance of each word and solve the problem of long-distance dependence caused by long query input.We crawled 649 Java open source projects from Github and processed them to get a training set of 114 364 pairs of annotation-API sequences.The experimental results show that the proposed method can increase BLUE index by more than about 20%compared with DeepApi method on Top1,Top5,and Top10.
作者
张睿峰
王鹏程
吴鸣
徐云
ZHANG Rui-Feng;WANG Peng-Cheng;WU Ming;XU Yun(School of Computer Science and Technology,University of Science and Technology of China,Hefei 230026,China;Key Laboratory of High Performance Computing of Anhui Province,Hefei 230026,China)
出处
《计算机系统应用》
2019年第9期209-214,共6页
Computer Systems & Applications
基金
国家自然科学基金面上项目(61672480)~~
关键词
API序列
推荐
注意力机制
深度学习
API sequences
recommendation
attention mechanisms
deep learning