摘要
针对经典一阶段目标检测算法RetinaNet难以充分提取不同阶段特征、边界框回归不够准确等问题,提出一个面向目标检测的改进型RetinaNet算法。在特征提取模块中加入多光谱通道注意力,将输入特征中的频率分量合并到注意力处理中,从而捕获特征原有的丰富信息。将多尺度特征融合模块添加到特征提取模块,多尺度特征融合模块包括1个路径聚合模块和1个特征融合操作,路径聚合模块通过搭建自底向上的路径,利用较浅特征层上精确的定位信号增强整个特征金字塔的信息流,特征融合操作通过融合来自每个阶段的特征信息优化多阶段特征的融合效果。此外,在边界框回归过程中引入完全交并比损失函数,从边界框的重叠面积、中心点距离和长宽比这3个重要的几何因素出发,提升回归过程的收敛速度与准确性。在MS COCO数据集和PASCAL VOC数据集上的实验结果表明,与RetinaNet算法相比,改进型RetinaNet算法在2个数据集上的平均精度分别提高了2.1、1.1个百分点,尤其对于MS COCO数据集中较大目标的检测,检测精度的提升效果更加显著。
Based on the problems that the classical one-stage object detection algorithm RetinaNet is difficult to fully extract and fuse different stage features,while the bounding box regression is not sufficiently accurate,an improved RetinaNet algorithm for object detection is proposed.First,the algorithm adds multispectral channel attention to the feature extraction module,which incorporates more frequency components in the input features into the attention processing to capture the original rich information of the features.Thereafter,the multiscale feature fusion module is added after the feature extraction module,and the multiscale feature fusion module includes a path aggregation module and a feature fusion operation.The pathaggregation module enhances the information flow of the entire feature pyramid by building bottom-up paths and using accurate positioning signals on shallower feature layers.The feature fusion operation further enhances the fusion effect of multistage features by fusing the feature information from each stage.Finally,the Complete Intersection over Union(CIoU)loss function is introduced in the bounding box regression process.The loss function starts from three important geometric factors,namely,the overlapping area of the bounding box,the distance between the center points,and the aspect ratio to improve the convergence speed of the regression process and accuracy.The experimental results on the MS COCO and PASCAL VOC datasets show that,compared with the RetinaNet algorithm,the average accuracy of the improved RetinaNet algorithm on the two datasets is increased by 2.1 and 1.1 percentage points,especially for the MS COCO data set.For the detection of large targets,improving the detection accuracy is more significant.
作者
于敏
屈丹
司念文
YU Min;QU Dan;SI Nianwen(School of Software,Zhengzhou University,Zhengzhou 450000,China;School of Information Systems Engineering,Strategic Support Force Information Engineering University,Zhengzhou 450000,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2022年第8期249-257,共9页
Computer Engineering
基金
国家自然科学基金(62171470,61673395)。
关键词
深度学习
目标检测
多光谱通道注意力
多尺度特征融合
完全交并比
deep learning
object detection
multi-spectral channel attention
multi-scale feature fusion
Complete Intersection over Union(CIoU)