期刊文献+
共找到1,171篇文章
< 1 2 59 >
每页显示 20 50 100
Bridge Crack Segmentation Method Based on Parallel Attention Mechanism and Multi-Scale Features Fusion 被引量:1
1
作者 Jianwei Yuan Xinli Song +2 位作者 Huaijian Pu Zhixiong Zheng Ziyang Niu 《Computers, Materials & Continua》 SCIE EI 2023年第3期6485-6503,共19页
Regular inspection of bridge cracks is crucial to bridge maintenance and repair.The traditional manual crack detection methods are timeconsuming,dangerous and subjective.At the same time,for the existing mainstream vi... Regular inspection of bridge cracks is crucial to bridge maintenance and repair.The traditional manual crack detection methods are timeconsuming,dangerous and subjective.At the same time,for the existing mainstream vision-based automatic crack detection algorithms,it is challenging to detect fine cracks and balance the detection accuracy and speed.Therefore,this paper proposes a new bridge crack segmentationmethod based on parallel attention mechanism and multi-scale features fusion on top of the DeeplabV3+network framework.First,the improved lightweight MobileNetv2 network and dilated separable convolution are integrated into the original DeeplabV3+network to improve the original backbone network Xception and atrous spatial pyramid pooling(ASPP)module,respectively,dramatically reducing the number of parameters in the network and accelerates the training and prediction speed of the model.Moreover,we introduce the parallel attention mechanism into the encoding and decoding stages.The attention to the crack regions can be enhanced from the aspects of both channel and spatial parts and significantly suppress the interference of various noises.Finally,we further improve the detection performance of the model for fine cracks by introducing a multi-scale features fusion module.Our research results are validated on the self-made dataset.The experiments show that our method is more accurate than other methods.Its intersection of union(IoU)and F1-score(F1)are increased to 77.96%and 87.57%,respectively.In addition,the number of parameters is only 4.10M,which is much smaller than the original network;also,the frames per second(FPS)is increased to 15 frames/s.The results prove that the proposed method fits well the requirements of rapid and accurate detection of bridge cracks and is superior to other methods. 展开更多
关键词 Crack detection DeeplabV3+ parallel attention mechanism feature fusion
下载PDF
Attention Guided Food Recognition via Multi-Stage Local Feature Fusion
2
作者 Gonghui Deng Dunzhi Wu Weizhen Chen 《Computers, Materials & Continua》 SCIE EI 2024年第8期1985-2003,共19页
The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregula... The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregular and multi-scale nature of food images.Addressing these complexities,our study introduces an advanced model that leverages multiple attention mechanisms and multi-stage local fusion,grounded in the ConvNeXt architecture.Our model employs hybrid attention(HA)mechanisms to pinpoint critical discriminative regions within images,substantially mitigating the influence of background noise.Furthermore,it introduces a multi-stage local fusion(MSLF)module,fostering long-distance dependencies between feature maps at varying stages.This approach facilitates the assimilation of complementary features across scales,significantly bolstering the model’s capacity for feature extraction.Furthermore,we constructed a dataset named Roushi60,which consists of 60 different categories of common meat dishes.Empirical evaluation of the ETH Food-101,ChineseFoodNet,and Roushi60 datasets reveals that our model achieves recognition accuracies of 91.12%,82.86%,and 92.50%,respectively.These figures not only mark an improvement of 1.04%,3.42%,and 1.36%over the foundational ConvNeXt network but also surpass the performance of most contemporary food image recognition methods.Such advancements underscore the efficacy of our proposed model in navigating the intricate landscape of food image recognition,setting a new benchmark for the field. 展开更多
关键词 Fine-grained image recognition food image recognition attention mechanism local feature fusion
下载PDF
AF-Net:A Medical Image Segmentation Network Based on Attention Mechanism and Feature Fusion 被引量:4
3
作者 Guimin Hou Jiaohua Qin +2 位作者 Xuyu Xiang Yun Tan Neal N.Xiong 《Computers, Materials & Continua》 SCIE EI 2021年第11期1877-1891,共15页
Medical image segmentation is an important application field of computer vision in medical image processing.Due to the close location and high similarity of different organs in medical images,the current segmentation ... Medical image segmentation is an important application field of computer vision in medical image processing.Due to the close location and high similarity of different organs in medical images,the current segmentation algorithms have problems with mis-segmentation and poor edge segmentation.To address these challenges,we propose a medical image segmentation network(AF-Net)based on attention mechanism and feature fusion,which can effectively capture global information while focusing the network on the object area.In this approach,we add dual attention blocks(DA-block)to the backbone network,which comprises parallel channels and spatial attention branches,to adaptively calibrate and weigh features.Secondly,the multi-scale feature fusion block(MFF-block)is proposed to obtain feature maps of different receptive domains and get multi-scale information with less computational consumption.Finally,to restore the locations and shapes of organs,we adopt the global feature fusion blocks(GFF-block)to fuse high-level and low-level information,which can obtain accurate pixel positioning.We evaluate our method on multiple datasets(the aorta and lungs dataset),and the experimental results achieve 94.0%in mIoU and 96.3%in DICE,showing that our approach performs better than U-Net and other state-of-art methods. 展开更多
关键词 Deep learning medical image segmentation feature fusion attention mechanism
下载PDF
FIBTNet:Building Change Detection for Remote Sensing Images Using Feature Interactive Bi-Temporal Network
4
作者 Jing Wang Tianwen Lin +1 位作者 Chen Zhang Jun Peng 《Computers, Materials & Continua》 SCIE EI 2024年第9期4621-4641,共21页
In this paper,a feature interactive bi-temporal change detection network(FIBTNet)is designed to solve the problem of pseudo change in remote sensing image building change detection.The network improves the accuracy of... In this paper,a feature interactive bi-temporal change detection network(FIBTNet)is designed to solve the problem of pseudo change in remote sensing image building change detection.The network improves the accuracy of change detection through bi-temporal feature interaction.FIBTNet designs a bi-temporal feature exchange architecture(EXA)and a bi-temporal difference extraction architecture(DFA).EXA improves the feature exchange ability of the model encoding process through multiple space,channel or hybrid feature exchange methods,while DFA uses the change residual(CR)module to improve the ability of the model decoding process to extract different features at multiple scales.Additionally,at the junction of encoder and decoder,channel exchange is combined with the CR module to achieve an adaptive channel exchange,which further improves the decision-making performance of model feature fusion.Experimental results on the LEVIR-CD and S2Looking datasets demonstrate that iCDNet achieves superior F1 scores,Intersection over Union(IoU),and Recall compared to mainstream building change detectionmodels,confirming its effectiveness and superiority in the field of remote sensing image change detection. 展开更多
关键词 Change detection change residual module feature exchange mechanism feature fusion
下载PDF
Improving VQA via Dual-Level Feature Embedding Network
5
作者 Yaru Song Huahu Xu Dikai Fang 《Intelligent Automation & Soft Computing》 2024年第3期397-416,共20页
Visual Question Answering(VQA)has sparked widespread interest as a crucial task in integrating vision and language.VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual r... Visual Question Answering(VQA)has sparked widespread interest as a crucial task in integrating vision and language.VQA primarily uses attention mechanisms to effectively answer questions to associate relevant visual regions with input questions.The detection-based features extracted by the object detection network aim to acquire the visual attention distribution on a predetermined detection frame and provide object-level insights to answer questions about foreground objects more effectively.However,it cannot answer the question about the background forms without detection boxes due to the lack of fine-grained details,which is the advantage of grid-based features.In this paper,we propose a Dual-Level Feature Embedding(DLFE)network,which effectively integrates grid-based and detection-based image features in a unified architecture to realize the complementary advantages of both features.Specifically,in DLFE,In DLFE,firstly,a novel Dual-Level Self-Attention(DLSA)modular is proposed to mine the intrinsic properties of the two features,where Positional Relation Attention(PRA)is designed to model the position information.Then,we propose a Feature Fusion Attention(FFA)to address the semantic noise caused by the fusion of two features and construct an alignment graph to enhance and align the grid and detection features.Finally,we use co-attention to learn the interactive features of the image and question and answer questions more accurately.Our method has significantly improved compared to the baseline,increasing accuracy from 66.01%to 70.63%on the test-std dataset of VQA 1.0 and from 66.24%to 70.91%for the test-std dataset of VQA 2.0. 展开更多
关键词 Visual question answering multi-modal feature processing attention mechanisms cross-model fusion
下载PDF
Multiscale feature learning and attention mechanism for infrared and visible image fusion
6
作者 GAO Li LUO DeLin WANG Song 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2024年第2期408-422,共15页
Current fusion methods for infrared and visible images tend to extract features at a single scale,which results in insufficient detail and incomplete feature preservation.To address these issues,we propose an infrared... Current fusion methods for infrared and visible images tend to extract features at a single scale,which results in insufficient detail and incomplete feature preservation.To address these issues,we propose an infrared and visible image fusion network based on a multiscale feature learning and attention mechanism(MsAFusion).A multiscale dilation convolution framework is employed to capture image features across various scales and broaden the perceptual scope.Furthermore,an attention network is introduced to enhance the focus on salient targets in infrared images and detailed textures in visible images.To compensate for information loss during convolution,jump connections are utilized during the image reconstruction phase.The fusion process utilizes a combined loss function consisting of pixel loss and gradient loss for unsupervised fusion of infrared and visible images.Extensive experiments on the dataset of electricity facilities demonstrate that our proposed method outperforms nine state-of-theart methods in terms of visual perception and four objective evaluation metrics. 展开更多
关键词 infrared and visible images image fusion attention mechanism CNN feature extraction
原文传递
Adaptive multi-modal feature fusion for far and hard object detection
7
作者 LI Yang GE Hongwei 《Journal of Measurement Science and Instrumentation》 CAS CSCD 2021年第2期232-241,共10页
In order to solve difficult detection of far and hard objects due to the sparseness and insufficient semantic information of LiDAR point cloud,a 3D object detection network with multi-modal data adaptive fusion is pro... In order to solve difficult detection of far and hard objects due to the sparseness and insufficient semantic information of LiDAR point cloud,a 3D object detection network with multi-modal data adaptive fusion is proposed,which makes use of multi-neighborhood information of voxel and image information.Firstly,design an improved ResNet that maintains the structure information of far and hard objects in low-resolution feature maps,which is more suitable for detection task.Meanwhile,semantema of each image feature map is enhanced by semantic information from all subsequent feature maps.Secondly,extract multi-neighborhood context information with different receptive field sizes to make up for the defect of sparseness of point cloud which improves the ability of voxel features to represent the spatial structure and semantic information of objects.Finally,propose a multi-modal feature adaptive fusion strategy which uses learnable weights to express the contribution of different modal features to the detection task,and voxel attention further enhances the fused feature expression of effective target objects.The experimental results on the KITTI benchmark show that this method outperforms VoxelNet with remarkable margins,i.e.increasing the AP by 8.78%and 5.49%on medium and hard difficulty levels.Meanwhile,our method achieves greater detection performance compared with many mainstream multi-modal methods,i.e.outperforming the AP by 1%compared with that of MVX-Net on medium and hard difficulty levels. 展开更多
关键词 3D object detection adaptive fusion multi-modal data fusion attention mechanism multi-neighborhood features
下载PDF
Triple Multimodal Cyclic Fusion and Self-Adaptive Balancing for Video Q&A Systems
8
作者 Xiliang Zhang Jin Liu +2 位作者 Yue Li Zhongdai Wu Y.Ken Wang 《Computers, Materials & Continua》 SCIE EI 2022年第12期6407-6424,共18页
Performance of Video Question and Answer(VQA)systems relies on capturing key information of both visual images and natural language in the context to generate relevant questions’answers.However,traditional linear com... Performance of Video Question and Answer(VQA)systems relies on capturing key information of both visual images and natural language in the context to generate relevant questions’answers.However,traditional linear combinations of multimodal features focus only on shallow feature interactions,fall far short of the need of deep feature fusion.Attention mechanisms were used to perform deep fusion,but most of them can only process weight assignment of single-modal information,leading to attention imbalance for different modalities.To address above problems,we propose a novel VQA model based on Triple Multimodal feature Cyclic Fusion(TMCF)and Self-AdaptiveMultimodal Balancing Mechanism(SAMB).Our model is designed to enhance complex feature interactions among multimodal features with cross-modal information balancing.In addition,TMCF and SAMB can be used as an extensible plug-in for exploring new feature combinations in the visual image domain.Extensive experiments were conducted on MSVDQA and MSRVTT-QA datasets.The results confirm the advantages of our approach in handling multimodal tasks.Besides,we also provide analyses for ablation studies to verify the effectiveness of each proposed component. 展开更多
关键词 Video question and answer systems feature fusion scaling matrix attention mechanism
下载PDF
复杂战场环境下改进YOLOv5军事目标识别算法研究 被引量:4
9
作者 宋晓茹 刘康 +2 位作者 高嵩 陈超波 阎坤 《兵工学报》 EI CAS CSCD 北大核心 2024年第3期934-947,共14页
复杂战场环境下军事目标识别技术是提升战场情报获取能力的基础和关键。针对当前军事目标识别技术在复杂战场环境下漏检误检率高、实时性差等问题,提出一种基于改进YOLOv5模型的PB-YOLO军事目标识别算法。将改进的目标识别算法对于陆战... 复杂战场环境下军事目标识别技术是提升战场情报获取能力的基础和关键。针对当前军事目标识别技术在复杂战场环境下漏检误检率高、实时性差等问题,提出一种基于改进YOLOv5模型的PB-YOLO军事目标识别算法。将改进的目标识别算法对于陆战场军事单元的识别锚框进行重新聚类,以提升模型对于目标大小适应度,加速模型收敛;采用通道-空间并行注意力机制,增加模型对复杂战场环境下目标特征信息与位置信息关注度;在特征融合网络部分使用BiFPN以提升模型对于特征的融合能力与速度;采用Alpha_IoU损失函数加速模型收敛,解决当真实框与预测框重合时IoU计算退化问题。实验结果表明,在自建军事目标数据集下,改进算法与主流目标识别算法相比,在保证模型空间复杂度的同时,mAP值达到了90.17%。消融实验对比结果表明,改进后网络较原模型精度提升11.57%,具有较好的识别性能,能够为战场情报获取提供有效的技术支撑。 展开更多
关键词 军事目标识别 通道-空间并行注意力机制 特征融合 损失函数
下载PDF
二阶逐层特征融合网络的图像超分辨重建 被引量:1
10
作者 于蕾 邓秋月 +1 位作者 郑丽颖 吴昊宇 《系统工程与电子技术》 EI CSCD 北大核心 2024年第2期391-400,共10页
针对一些超分辨网络忽略了对网络各层次特征的复用以及融合的问题,构建了具有较强特征复用和融合能力的二阶逐层特征融合超分辨网络,以获得具有高分辨率、高保真度的重建图像。网络的核心是逐层特征融合模块,该模块通过特征融合操作增... 针对一些超分辨网络忽略了对网络各层次特征的复用以及融合的问题,构建了具有较强特征复用和融合能力的二阶逐层特征融合超分辨网络,以获得具有高分辨率、高保真度的重建图像。网络的核心是逐层特征融合模块,该模块通过特征融合操作增强特征的重用。此外,还提出了二阶特征融合机制,该机制在网络的局部和全局层次上采用逐层特征融合方法进行特征融合。实验结果表明该网络的重建图像在线条和轮廓上更清晰,并且在峰值信噪比和结构相似度上也取得了更好的结果。例如当缩放尺度因子为2时,各测试集上的峰值信噪比/结构相似度依次为38.20 dB/0.9612、33.81 dB/0.9195、32.28 dB/0.9010、32.65 dB/0.9324、39.11 dB/0.9779,相比其他模型有一定提升,从客观标准和主观角度证明了二阶逐层特征融合超分辨网络具有一定的优越性。 展开更多
关键词 超分辨重建 卷积神经网络 特征融合 二阶特征融合机制
下载PDF
基于改进DETR的机器人铆接缺陷检测方法研究 被引量:1
11
作者 李宗刚 宋秋凡 +1 位作者 杜亚江 陈引娟 《铁道科学与工程学报》 EI CAS CSCD 北大核心 2024年第4期1690-1700,共11页
铆接作为铁道车辆结构件的主要连接方式,合格的铆接质量是车辆安全稳定运行的重要保证。针对现有铆接缺陷检测方法存在检测精度低、检测点位少、检测智能化水平不高等问题,提出一种基于改进DETR的机器人铆接缺陷检测方法。首先,搭建铆... 铆接作为铁道车辆结构件的主要连接方式,合格的铆接质量是车辆安全稳定运行的重要保证。针对现有铆接缺陷检测方法存在检测精度低、检测点位少、检测智能化水平不高等问题,提出一种基于改进DETR的机器人铆接缺陷检测方法。首先,搭建铆接缺陷检测系统,依次采集工件尺寸大、铆钉尺寸小工况下的铆接缺陷图像。其次,为了增强DETR模型在小目标中的图像特征提取能力和检测性能,以EfficientNet作为DETR中的主干特征提取网络,并将3-D权重注意力机制SimAM引入EfficientNet网络,从而有效保留图像特征层的镦头形态信息和铆点区域的空间信息。然后,在颈部网络中引入加权双向特征金字塔模块,以EfficientNet网络的输出作为特征融合模块的输入对各尺度特征信息进行聚合,增大不同铆接缺陷的类间差异。最后,利用Smooth L1和DIoU的线性组合改进原模型预测网络的回归损失函数,提高模型的检测精度和收敛速度。结果表明,改进模型表现出较高的检测性能,对于铆接缺陷的平均检测精度mAP为97.12%,检测速度FPS为25.4帧/s,与Faster RCNN、YOLOX等其他主流检测模型相比,在检测精度和检测速度方面均具有较大优势。研究结果能够满足实际工况中大型铆接件的小尺寸铆钉铆接缺陷实时在线检测的需求,为视觉检测技术在铆接工艺中的应用提供一定的参考价值。 展开更多
关键词 铆接缺陷检测 DETR EfficientNet 3-D注意力机制 多尺度加权特征融合
下载PDF
基于YOLOv5s的改进实时红外小目标检测 被引量:1
12
作者 谷雨 张宏宇 彭冬亮 《激光与红外》 CAS CSCD 北大核心 2024年第2期281-288,共8页
针对红外图像分辨率低、背景复杂、目标细节特征缺失等问题,提出了一种基于YOLOv5s的改进实时红外小目标检测模型Infrared-YOLOv5s。在特征提取阶段,采用SPD-Conv进行下采样,将特征图切分为特征子图并按通道拼接,避免了多尺度特征提取... 针对红外图像分辨率低、背景复杂、目标细节特征缺失等问题,提出了一种基于YOLOv5s的改进实时红外小目标检测模型Infrared-YOLOv5s。在特征提取阶段,采用SPD-Conv进行下采样,将特征图切分为特征子图并按通道拼接,避免了多尺度特征提取过程中下采样导致的特征丢失情况,设计了一种基于空洞卷积的改进空间金字塔池化模块,通过对具有不同感受野的特征进行融合来提高特征提取能力;在特征融合阶段,引入由深到浅的注意力模块,将深层特征语义特征嵌入到浅层空间特征中,增强浅层特征的表达能力;在预测阶段,裁减了网络中针对大目标检测的特征提取层、融合层及预测层,降低模型大小的同时提高了实时性。首先通过消融实验验证了提出各模块的有效性,实验结果表明,改进模型在SIRST数据集上平均精度均值达到了95.4%,较原始YOLOv5s提高了2.3%,且模型大小降低了72.9%,仅为4.5 M,在Nvidia Xavier上推理速度达到28 f/s,利于实际的部署和应用。在Infrared-PV数据集上的迁移实验进一步验证了改进算法的有效性。提出的改进模型在提高红外图像小目标检测性能的同时,能够满足实时性要求,因而适用于红外图像小目标实时检测任务。 展开更多
关键词 红外小目标检测 YOLOv5s 注意力机制 特征融合
下载PDF
改进的轻量级行人目标检测算法 被引量:1
13
作者 金梅 任婷婷 +2 位作者 张立国 闫梦萧 沈明浩 《计量学报》 CSCD 北大核心 2024年第2期186-193,共8页
针对行人目标数量密集、目标尺度小和目标周围背景光照强弱不一而导致的检测精度低的问题,提出一种基于特征融合的轻量化行人检测算法。以TinyYOLOv4为基础框架,首先,搭建新的主干特征提取网络(CSPDarknet53-S),在原主干网络的基础上加... 针对行人目标数量密集、目标尺度小和目标周围背景光照强弱不一而导致的检测精度低的问题,提出一种基于特征融合的轻量化行人检测算法。以TinyYOLOv4为基础框架,首先,搭建新的主干特征提取网络(CSPDarknet53-S),在原主干网络的基础上加入新的特征提取模块(REM)来增强网络提取行人特征的能力。其次,改进特征融合结构,在主干网络提取高低层特征图后,先是在主干网络与特征融合网络间加入特征融合模块(RM-block)来增大感受野;然后引入浅层特征信息保留更多小目标特征,形成新的特征融合网络(IFFM)。最后,通过YOLO Head对融合来的特征图进行处理获得输出结果。实验结果表明,提出的算法在行人数据集(PASCAL VOC2007和VOC2012的person数据)上取得了较高的检测精度以及较好的检测效果。 展开更多
关键词 目标检测 特征融合 浅层特征 TinyYOLOv4算法 注意力机制
下载PDF
双通道特征融合的真实场景点云语义分割方法 被引量:1
14
作者 孙刘杰 朱耀达 王文举 《计算机工程与应用》 CSCD 北大核心 2024年第12期160-169,共10页
真实场景点云不仅具有点云的空间几何信息,还具有三维物体的颜色信息,现有的网络无法有效利用真实场景的局部特征以及空间几何特征信息,因此提出了一种双通道特征融合的真实场景点云语义分割方法DCFNet(dual-channel feature fusion of ... 真实场景点云不仅具有点云的空间几何信息,还具有三维物体的颜色信息,现有的网络无法有效利用真实场景的局部特征以及空间几何特征信息,因此提出了一种双通道特征融合的真实场景点云语义分割方法DCFNet(dual-channel feature fusion of real scene for point cloud semantic segmentation)可用于不同场景下的室内外场景语义分割。更具体地说,为了解决不能充分提取真实场景点云颜色信息的问题,该方法采用上下两个输入通道,通道均采用相同的特征提取网络结构,其中上通道的输入是完整RGB颜色和点云坐标信息,该通道主要关注于复杂物体对象场景特征,下通道仅输入点云坐标信息,该通道主要关注于点云的空间几何特征;在每个通道中为了更好地提取局部与全局信息,改善网络性能,引入了层间融合模块和Transformer通道特征扩充模块;同时,针对现有的三维点云语义分割方法缺乏关注局部特征与全局特征的联系,导致对复杂场景的分割效果不佳的问题,对上下两个通道所提取的特征通过DCFFS(dual-channel feature fusion segmentation)模块进行融合,并对真实场景进行语义分割。对室内复杂场景和大规模室内外场景点云分割基准进行了实验,实验结果表明,提出的DCFNet分割方法在S3DIS Area5室内场景数据集以及STPLS3D室外场景数据集上,平均交并比(MIOU)分别达到71.18%和48.87%,平均准确率(MACC)和整体准确率(OACC)分别达到77.01%与86.91%,实现了真实场景的高精度点云语义分割。 展开更多
关键词 深度学习 双通道特征融合 点云语义分割 注意力机制
下载PDF
基于MCB-FAH-YOLOv8的钢材表面缺陷检测算法 被引量:5
15
作者 崔克彬 焦静颐 《图学学报》 CSCD 北大核心 2024年第1期112-125,共14页
针对现有基于深度学习的钢材表面缺陷检测算法存在误检、漏检和检测精度低等问题,提出一种基于改进CBAM(modified CBAM,MCB)和可替换四头ASFF预测头(four-head ASFF prediction head,FAH)的YOLOv8钢材表面缺陷检测算法,简记为MCB-FAH-YO... 针对现有基于深度学习的钢材表面缺陷检测算法存在误检、漏检和检测精度低等问题,提出一种基于改进CBAM(modified CBAM,MCB)和可替换四头ASFF预测头(four-head ASFF prediction head,FAH)的YOLOv8钢材表面缺陷检测算法,简记为MCB-FAH-YOLOv8。通过加入改进后的卷积注意力机制模块(CBAM)对密集目标更好的确定;通过将FPN结构改为BiFPN更加高效的提取上下文信息;通过增加自适应特征融合(ASFF)自动找出最适合的融合特征;通过将SPPF模块替换为精度更高的SimCSPSPPF模块。同时,针对微小物体检测,提出了四头ASFF预测头,可根据数据集特点进行替换。实验结果表明,MCB-FAH-YOLOv8算法在VOC2007数据集上检测精度(mAP)达到了88.8%,在NEU-DET钢铁缺陷检测数据集上检测精度(mAP)达到了81.8%,较基准模型分别提高了5.1%和3.4%,该算法在牺牲较少检测速度的情况下取得较高的检测精度,很好的平衡了算法的精度和速度。 展开更多
关键词 MCB-FAH-YOLOv8 缺陷检测 注意力机制 四头ASFF预测头 特征融合
下载PDF
基于多尺度注意力特征融合的场景文本检测 被引量:1
16
作者 厍向阳 刘哲 董立红 《计算机工程与应用》 CSCD 北大核心 2024年第1期198-206,共9页
针对目前文本检测中小尺度文本和长文本检测精度低的问题,提出了一种基于多尺度注意力特征融合的场景文本检测算法。该方法以Mask R-CNN为基线模型,引入Swin_Transformer作为骨干网络提取底层特征。在特征金字塔(feature pyramid networ... 针对目前文本检测中小尺度文本和长文本检测精度低的问题,提出了一种基于多尺度注意力特征融合的场景文本检测算法。该方法以Mask R-CNN为基线模型,引入Swin_Transformer作为骨干网络提取底层特征。在特征金字塔(feature pyramid networks,FPN)中,通过将多尺度注意力热图与底层特征通过横向连接相融合,使检测器的不同层级专注于特定尺度的目标,并利用相邻层注意力热图之间的关系实现了FPN结构中的纵向特征共享,避免了不同层之间梯度计算的不一致性问题。实验结果表明:在ICDAR2015数据集上,该方法的准确率、召回率和F值分别达到了88.3%、83.07%和85.61%,在CTW1500和Total-Text弯曲文本数据集上相较现有方法均有良好表现。 展开更多
关键词 场景文本检测 Mask R-CNN Swin Transformer 注意力机制 多尺度特征融合
下载PDF
基于AM和CNN的多级特征融合的风力发电机轴承故障诊断方法 被引量:1
17
作者 王进花 韩金玉 +1 位作者 曹洁 王亚丽 《太阳能学报》 EI CAS CSCD 北大核心 2024年第5期51-61,共11页
提出一种基于注意力机制的多级特征融合卷积神经网络(A2ML2F-CNN)故障诊断方法。该方法将原始电流和振动信号作为输入,首先使用基于注意力卷积神经网络(AMCNN)模块分别进行数据信号特征提取,并进行一级特征融合连接。在此基础上,再次分... 提出一种基于注意力机制的多级特征融合卷积神经网络(A2ML2F-CNN)故障诊断方法。该方法将原始电流和振动信号作为输入,首先使用基于注意力卷积神经网络(AMCNN)模块分别进行数据信号特征提取,并进行一级特征融合连接。在此基础上,再次分别采用注意力机制一维卷积神经网(AM1DCNN)和二维卷积神经网络(2DCNN)提取相关信息,并进行二级特征融合,以此来解决单传感器数据故障信息不足及互补特征难以提取的问题,最后采用全连接层和Softmax层进行分类,得到诊断结果。为验证所提方法的故障诊断效果,通过帕德伯恩数据集进行实验验证,并将其与CNN、LSTM、SVM等方法的诊断精度进行对比,相较于上述方法,该文方法的诊断准确率分别提高1.8、3.2和4.8个百分点,验证了所提方法的有效性。 展开更多
关键词 风力机 故障诊断 特征融合 注意力机制 卷积神经网络 风力发电机轴承
原文传递
基于改进的YOLOv5安全帽佩戴检测算法 被引量:1
18
作者 雷建云 李志兵 +1 位作者 夏梦 田望 《湖北大学学报(自然科学版)》 CAS 2024年第1期1-13,共13页
针对安全帽佩戴检测中存在的误检和漏检的问题,提出一种基于YOLOv5模型改进的安全帽佩戴检测算法。改进模型引入多尺度加权特征融合网络,即在YOLOv5的网络结构中增加一个浅层检测尺度,并引入特征权重进行加权融合,构成新的四尺检测结构... 针对安全帽佩戴检测中存在的误检和漏检的问题,提出一种基于YOLOv5模型改进的安全帽佩戴检测算法。改进模型引入多尺度加权特征融合网络,即在YOLOv5的网络结构中增加一个浅层检测尺度,并引入特征权重进行加权融合,构成新的四尺检测结构,有效地提升图像浅层特征的提取及融合能力;在YOLOv5的Neck网络的BottleneckCSP结构中加入SENet模块,使模型更多地关注目标信息忽略背景信息;针对大分辨率的图像,添加图像切割层,避免多倍下采样造成的小目标特征信息大量丢失。对YOLOv5模型进行改进之后,通过自制的安全帽数据集进行训练检测,mAP和召回率分别达到97.06%、92.54%,与YOLOv5相比较分别提升了4.74%和4.31%。实验结果表明:改进的YOLOv5算法可有效提升安全帽佩戴的检测性能,能够准确识别施工人员的安全帽佩戴情况,从而大大降低施工现场的安全风险。 展开更多
关键词 目标检测 多尺度加权特征融合 注意力机制 图像切割
下载PDF
MFE-YOLOX:无人机航拍下密集小目标检测算法 被引量:2
19
作者 马俊燕 常亚楠 《重庆邮电大学学报(自然科学版)》 CSCD 北大核心 2024年第1期128-135,共8页
针对无人机航拍时物体尺度变化大,检测目标大多较小且物体较密集的问题,提出一种混合特征增强结构(mix feature enhancement, MFE)方法。通过在超分辨率方法中加入注意力机制以增强小目标信息提取,利用一种新的特征层融合计算方法,加强... 针对无人机航拍时物体尺度变化大,检测目标大多较小且物体较密集的问题,提出一种混合特征增强结构(mix feature enhancement, MFE)方法。通过在超分辨率方法中加入注意力机制以增强小目标信息提取,利用一种新的特征层融合计算方法,加强不同特征层间的融合效率,提高了中小型目标的检测精度;设计了尾端感受野扩大层以扩大尾端特征层感受野,使检测头可接收丰富的物体信息来定位并区分密集物体。实验在数据集VisDrone2021的测试集上进行测试,MFE-YOLOX网络的AP50结果为47.78%,在参数量、计算量与原网络相近的情况下精度提高了9.43个百分点。 展开更多
关键词 小目标检测 无人机 注意力机制 特征融合 YOLOX
下载PDF
基于复合跨模态交互网络的时序多模态情感分析 被引量:1
20
作者 杨力 钟俊弘 +1 位作者 张赟 宋欣渝 《计算机科学与探索》 CSCD 北大核心 2024年第5期1318-1327,共10页
针对多模态情感分析中存在的不同模态间语义特征差异性导致模态融合不充分、交互性弱等问题,通过研究分析不同模态之间存在的潜在关联性,搭建一种基于复合跨模态交互网络的时序多模态情感分析(CCIN-SA)模型。该模型首先使用双向门控循... 针对多模态情感分析中存在的不同模态间语义特征差异性导致模态融合不充分、交互性弱等问题,通过研究分析不同模态之间存在的潜在关联性,搭建一种基于复合跨模态交互网络的时序多模态情感分析(CCIN-SA)模型。该模型首先使用双向门控循环单元和多头注意力机制提取具有上下文语义信息的文本、视觉和语音模态时序特征;然后,设计跨模态注意力交互层,利用辅助模态的低阶信号不断强化目标模态,使得目标模态学习到辅助模态的信息,捕获模态间的潜在适应性;再将增强后的特征输入到复合特征融合层,通过条件向量进一步捕获不同模态间的相似性,增强重要特征的关联程度,挖掘模态间更深层次的交互性;最后,利用多头注意力机制将复合跨模态强化后的特征与低阶信号做拼接融合,提高模态内部重要特征的权重,保留初始模态独有的特征信息,将得到的多模态融合特征进行最终的情感分类任务。在CMU-MOSI和CMUMOSEI数据集上进行模型评估,结果表明,CCIN-SA模型相比其他现有模型在准确率和F1指标上均有提高,能够有效挖掘不同模态间的关联性,做出更加准确的情感判断。 展开更多
关键词 跨模态交互 注意力机制 特征融合 复合融合层 多模态情感分析
下载PDF
上一页 1 2 59 下一页 到第
使用帮助 返回顶部