摘要
维护代码和对应的文档的关联在软件维护、程序理解、需求跟踪等软件工程活动中有重要的意义。维护这些关联其关键在于提取关联信息,提出了一种利用信息检索技术自动提取程序源代码和中文文档关联信息的方法。首先通过提取文档中的词汇建立文档的语言概率模型,在此基础上用由代码信息组成的检索项检索文档集,由此得到代码和文档的相关列表和关联矩阵。测试结果表明在提取项大于5时即可获得95%以上的关联。
Tracing and maintaining links between free text documents in Chinese and its source code plays important role in software engineering. A new method based on Information Retrieval(IR) to do this work automatically is proposed. First of all, a stochastic language model is built which assigns a probability to every query string of words taken from all of the documents;then for each source code file, a list of documents ranked according to the probability of relevance are generated. Based on these, a relevance matrix linking each source code file to the documents could be got. Experiments shows that above 95 percent of the links could be traced when we only take the top 5 documents from the ranked list.
出处
《计算机应用与软件》
CSCD
北大核心
2005年第9期48-49,110,共3页
Computer Applications and Software