融合改进ASPP和极化自注意力的自底向上全景分割被引量：2

The improved atrous spatial pyramid pooling and polarized self-attention based bottom-up panoptic segmentation

导出

摘要目的针对ASPP(atrous spatial pyramid pooling)在空洞率变大时空洞(atrous)卷积效果会变差的情况,以及图像分类经典模型ResNet(residual neural network)并不能有效地适用于细粒度图像分割任务的问题,提出一种基于改进ASPP和极化自注意力的自底向上全景分割方法。方法重新设计ASPP模块,将小空洞率卷积的输出与原始输入进行拼接(concat),将得到的结果作为新的输入传递给大空洞率卷积,然后将不同空洞率卷积的输出结果拼接,并将得到的结果与ASPP中的其他模块进行最后拼接,从而改善ASPP中因空洞率变大导致的空洞卷积效果变差的问题,达到既获得足够感受野的同时又能编码多尺度信息的目的;在主干网络的输出后引入改进的极化自注意力模块,实现对图像像素级的自我注意强化,使其得到的特征能直接适用于细粒度像素分割任务。结果本文在Cityscapes数据集的验证集上进行测试,与复现的基线网络Panoptic-DeepLab(58.26%)相比,改进ASPP模块后分割精度PQ(panoptic quality)(58.61%)提高了0.35%,运行时间从103 ms增加到124 ms,运行速度没有明显变化;通过进一步引入极化自注意力,PQ指标(58.86%)提高了0.25%,运行时间增加到187 ms;通过对该注意力模块进一步改进,PQ指标(59.36%)在58.86%基础上又提高了0.50%,运行时间增加到192 ms,速度略有下降,但实时性仍好于大多数方法。结论本文采用改进ASPP和极化自注意力模块,能够更有效地提取适合细粒度像素分割的特征,且在保证足够感受野的同时能编码多尺度信息,从而提升全景分割性能。 Objective Panoptic segmentation can be as a challenging task in computer vision and image segmentation nowadays.It is focused on all objects-related segmentation in an image relevant to such categories of foreground“thing”and background“stuff”.Panoptic segmentation can optimize semantic segmentation and instance segmentation to a certain extent in relevance to such domain of vision applications like autonomous driving,simultaneous localization and mapping(SLAM),multi-object tracking and segmentation(MOTS).Most of panoptic segmentation methods can be used to follow the top-down path and the principle of detection before segmentation.Such method is based on instance segmentation or object detection,and a semantic branch is added to rich semantic segmentation.The segmentation performance of these models is feasible,but it needs a complex post-processing stage to deal with branches-between and within conflicts,which can make the inference be slower.Another category of these methods can follow the idea of bottom-up,for which semantic segmentation can be regarded as the basis,and the image can be recognized as a whole at the pixel level.It can be used to optimize tedious post-processing.Recently,a bottom-up panoptic segmentation(Panoptic-DeepLab)is used to divide the panoptic segmentation task into two branches.Each branch has a specific decoder network and segmentation head network.The semantic segmentation head outputs the semantic segmentation results.The same structure-related two instance heads can be used to predict the center instance and offset simutaneously.It can get better segmentation accuracy and speed.However,the atrous spatial pyramid pooling(ASPP)module is still used in the decoder network to increase the receptive field.For ASPP,to obtain a large enough receptive field,it needs sufficient dilation rate.When the dilation rate is larger,the effect of atrous convolution is worse.On the other hand,residual neural network(ResNet)is used as a shared encoder,which may be sub-optimal for fine-grained image segmentation.To resolve the problems mentioned above,we develop a new panoptic segmentation model for better segmentation performance.Method A bottom-up panoptic segmentation method is developed in terms of improved ASPP and polarized self-attention.First,for ASPP,we redesigned it,called improved atrous spatial pyramid pooling(IASPP).Specifically,1)dilation rate of rate1-related output of 3×3 convolutions is concatenated with the original input,and it is input into 3×3 convolution with the dilation rate2;2)dilation rate1 and rate2-related output of 3×3 convolutions is concatenated with the original input,and it is input into 3×3 convolution with the dilation rate of rate3.Then,different dilation rates-related output of convolution is concatenated as well.Finally,the results are obtained and concatenated with other ASPP-related modules.Through a series of atrous convolutions and feature concatenations,final output of the IASPP can obtain a larger receptive field without ASPP-related kernel degradation.Furthermore,the IASPP are not used to increase the size of the model significantly,and the speed of the model is not increased dramatically as well.In addition,polarization self-attention(PSA)can be used to enhance the feature extraction ability of the shared backbone further.After the fourth layer of ResNet-50 is concerned about,improved polarization self-attention(IPSA)module is introduced to extract pixel-level features.This process can enhance the ability of ResNet to extract costefficient pixel-level information.The output features can be used preserve pixel-level information,and it can be applied to typical fine-grained image segmentation tasks to estimate the highly nonlinear pixel-wise semantics straightfoward.Result The method is tested on the cityscapes dataset.The cityscapes dataset is composed of 19 categories,including 11 background and 8 foreground contexts.It consists of images samples of 2975 training,500 validation,and 1525 test contexts.Each image has a size of 1024×2048 pixels approximately.The training set can be used to train the network and the validation set is used to test the network.Compared to the baseline,experimental results demonstrate that the proposed model’s panoptic quality(PQ)is improved from 58.26%to 58.61%,and the runtime is optimized from 103 ms to 124 ms when the improved atrous spatial pyramid pooling(IASPP)module is melted into.Additionally,after the addition of the polarized self-attention(PSA),the PQ of the model is improved from 58.61%to 58.86%at the cost of the runtime from 124 ms to 187 ms.After improving the polarized self-attention(IPSA),the PQ of the model is improved from 58.86%to 59.36%while the runtime is reached to 192 ms.We carried out visual experiments,including the visualization of the image,performance comparison of different categories,and comparison with other related methods further.Conclusion To optimize the bottom-up panoptic segmentation method,a panoptic segmentation method is developed based on improved ASPP(IASPP)and polarized self-attention(IPSA).This redesigned ASPP method can resolve the problem of atrous convolution failure effectively derived of the increase of dilation rate in ASPP,and the introduction of IPSA can improve the ability of ResNet-50 to extract pixel-level fine-grained features,and rich pixel-level feature information can be preserved in the process of feature extraction to estimate the highly nonlinear pixel-wise semantics.To improve the comprehensive performance of panoptic segmentation,it cannot only achieve better segmentation accuracy,but also maintain a better speed further.

作者李新叶陈丁 Li Xinye;Chen Ding(Department of Electronic and Communication Engineering,North China Electric Power University,Baoding 071003,China;Hebei Key Laboratory of Power Internet of Things Technology,North China Electric Power University,Baoding 071003,China)

机构地区华北电力大学电子与通信工程系华北电力大学河北省电力物联网技术重点实验室

出处《中国图象图形学报》 CSCD 北大核心 2023年第8期2410-2419,共10页 Journal of Image and Graphics

基金中央高校基本科研业务费专项资金资助(2020YJ006) 河北省省级科技计划资助项目(SZX2020034)。

关键词全景分割语义分割实例分割极化自注意力 ASPP panoptic segmentation semantic segmentation instance segmentation polarized self-attention atrous spatial pyramid pooling(ASPP)

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1王囡,侯志强,蒲磊,马素刚,程环环.空洞可分离卷积和注意力机制的实时语义分割[J].中国图象图形学报,2022,27(4):1216-1225. 被引量：8

二级参考文献1

1青晨,禹晶,肖创柏,段娟.深度卷积神经网络图像语义分割研究进展[J].中国图象图形学报,2020,25(6):1069-1090. 被引量：36

共引文献7

1杜睿山,宋健辉,孟令东.基于注意力机制的岩石铸体薄片轻量化分割[J].计算机技术与发展,2023,33(10):128-134.
2柳杨,赵泉华,李玉,杨馨慧.融合注意力的轻量级遥感影像村落提取方法[J].测绘科学,2023,48(6):161-171.
3倪波,柯亨进,刘志远.一种基于局部与全局表征交叉耦合的脑卒中病灶分割网络模型[J].中南民族大学学报（自然科学版）,2023,42(6):828-838.
4柳杨,赵泉华,贾淑涵,李玉.辽宁省农村村落空间分布格局及影响因素分析[J].遥感信息,2023,38(5):81-88.
5高常鑫,徐正泽,吴东岳,余昌黔,桑农.深度学习实时语义分割综述[J].中国图象图形学报,2024,29(5):1119-1145. 被引量：1
6王卓,瞿绍军.深度学习实时语义分割研究进展和挑战[J].中国图象图形学报,2024,29(5):1188-1220. 被引量：1
7陈丹,刘乐,王晨昊,白熙茹,王子晨.复杂场景下自适应注意力机制融合实时语义分割[J].电子与信息学报,2024,46(8):3334-3342.

同被引文献29

1彭红星,何慧君,高宗梅,田兴国,邓倩婷,咸春龙.基于改进ShuffleNetV2模型的荔枝病虫害识别方法[J].农业机械学报,2022,53(12):290-300. 被引量：22
2沈思远,华蓓,黄汝维.改进YOLOv5的路面裂缝检测模型研究[J].电子测量技术,2023,46(21):132-142. 被引量：2
3曾芬芳,梁柏林,刘镇,王建华.基于数据手套的人机交互环境设计[J].中国图象图形学报（A辑）,2000,5(2):153-157. 被引量：40
4刘雪利,张玉秋,于洋.白菜栽培繁育技术的发展历程[J].吉林农业（下半月）,2017(5):82-82. 被引量：2
5贾少鹏,高红菊,杭潇.基于深度学习的农作物病虫害图像识别技术研究进展[J].农业机械学报,2019,50(B07):313-317. 被引量：79
6段彦丽,韩振芹,杨苗苗.北京郊区白菜重要病虫害田间诊断及绿色防控技术[J].北京农业职业学院学报,2019,33(6):20-24. 被引量：1
7王美华,吴振鑫,周祖光.基于注意力改进CBAM的农作物病虫害细粒度识别研究[J].农业机械学报,2021,52(4):239-247. 被引量：54
8胡根生,吴继甜,鲍文霞,曾伟辉.基于改进YOLOv5网络的复杂背景图像中茶尺蠖检测[J].农业工程学报,2021,37(21):191-198. 被引量：26
9孙俊,朱伟栋,罗元秋,沈继锋,陈义德,周鑫.基于改进MobileNet-V2的田间农作物叶片病害识别[J].农业工程学报,2021,37(22):161-169. 被引量：57
10尚钰莹,张倩如,宋怀波.基于YOLOv5s的深度学习在自然场景苹果花朵检测中的应用[J].农业工程学报,2022,38(9):222-229. 被引量：29

引证文献2

1杜韬,胡瑞珍,刘利斌,弋力,赵昊.室内场景拟人交互研究进展[J].中国图象图形学报,2024,29(6):1575-1606.
2郑俊键,兰玉彬,熊万杰,李硕,杨润娜,董昕.基于YOLOv5s改进模型的小白菜虫害识别方法[J].农业工程学报,2024,40(13):124-133.

1蒋懿波,刘会家,吴田.基于改进残差网络的输电线路雷击过电压识别研究[J].广西师范大学学报（自然科学版）,2023,41(4):74-83. 被引量：1
2米静.情境教学法在小学古诗词教学中的运用策略[J].河南教育（教师教育）（下）,2023(7):53-53. 被引量：3
3刘艳准.小麦秋季病虫害的多样化综合防治措施[J].种子世界,2022(6):84-86.
4王安佳,陈世涛.基于视频分析技术的选煤厂人员不安全行为识别研究与应用[J].洁净煤技术,2023,29(S01):164-168.
5Schahrazad Soltane,Sameer Alsharif,Salwa M.Serag Eldin.Classification and Diagnosis of Lymphoma’s Histopathological Images Using Transfer Learning[J].Computer Systems Science & Engineering,2022,40(2):629-644.
6季莉.基于全分辨率注意力U-Net神经网络的区域分割方法[J].无线电工程,2023,53(9):1981-1989. 被引量：1
7陆春媚,杨志景.多级精细化反卷积点云补全网络[J].计算机工程与应用,2023,59(17):242-249. 被引量：2
8刘晓丽.新会计准则下我国高校会计教学改革分析[J].中文科技期刊数据库（全文版）教育科学,2023(8):162-165.
9胡斌,张泽均.基于边缘信息的RGB-D图像超像素分割算法[J].计算机时代,2023(9):111-115.
10周金治,胡震,郭莉莉,龚莉,张翁荣.基于GAN-DAUnet的肝脏CT图像肿瘤分割算法[J].中国医学物理学杂志,2023,40(8):971-976.

中国图象图形学报

2023年第8期

浏览历史

内容加载中请稍等...

融合改进ASPP和极化自注意力的自底向上全景分割被引量：2

参考文献1

二级参考文献1

共引文献7

同被引文献29

引证文献2

相关作者

相关机构

相关主题

浏览历史

融合改进ASPP和极化自注意力的自底向上全景分割 被引量：2

参考文献1

二级参考文献1

共引文献7

同被引文献29

引证文献2

相关作者

相关机构

相关主题

浏览历史

融合改进ASPP和极化自注意力的自底向上全景分割被引量：2