摘要
针对无人机遥感道路影像内目标分布混乱且尺寸差异大、负样本所占比例较高等问题,提出了基于YOLOv5X的无人机遥感影像道路目标检测模型RA_YOLOv5。采用感受野-坐标注意力卷积替换骨干网络内的常规卷积核,然后以空洞-空间金字塔池化-通道注意力层替换原始特征金字塔池化层;在特征融合网络中引入自适应特征融合层,通过特征图加权融合解决不同尺寸检测图之间样本、背景矛盾的问题;使用解耦检测头分别计算回归、分类任务,并替换损失函数以缓解正负样本不均衡问题。实验结果表明,RA_YOLOv5在VisDrone数据集上平均精度均值达到90.42%,较YOLOv5X提高了7.85%,在测试环境下,检测帧数达到35.46帧每秒,能够实际输出检测结果,同时具有良好的稳定性,能够在道路巡检、交通流量监控、应急事故处理等多种场景下发挥重要作用。
Unmanned aerial vehicle(UAV) remote sensing road images have a chaotic distribution of targets,large size differences,and a large proportion of negative samples.To solve these problems,a road target detection model RA_YOLOv5 in UAV remote sensing images based on YOLOv5X was proposed.Receptive field-coordinate attention convolution was used to replace the conventional convolution kernel in the backbone network,and then the cavity-spatial pyramid pooling-channel attention layer was used to replace the original feature pyramid pooling layer.An adaptive feature fusion layer was introduced in the feature fusion network.Through the weighted fusion of feature maps,the problem of sample and background conflicts between detection maps of different sizes was solved.Decoupled detection heads were employed to calculate regression and classification tasks,respectively,and the loss function was replaced to alleviate the problem of imbalance between positive and negative samples.Experimental results show that RA_YOLOv5 has an average accuracy of 90.42% on the VisDrone data set,which is 7.85% higher than YOLOv5X.The number of detection frames per second in the test environment reaches 35.46 f/s.It can output actual detection results,has good stability,and plays an important role in various scenarios such as road inspection,traffic flow monitoring,and emergency accident handling.
作者
曹佃龙
CAO Dianlong(Liaoning Branch,China National Geologic Exploration Center of Building Materials Industry,Shenyang,Liaoning 110004,China)
出处
《北京测绘》
2024年第6期936-941,共6页
Beijing Surveying and Mapping
关键词
无人机遥感
道路目标检测
感受野-坐标注意力
自适应特征融合
解耦检测头
unmanned aerial vehicle(UAV)remote sensing
road target detection
receptive field-coordinate attention
adaptive feature fusion
decoupled detection head